Print Friendly

Rockwell, Geoffrey, Department of Philosophy and Humanities Computing, University of Alberta, Canada, geoffrey.rockwell@ualberta.ca
Brown, Susan, University of Alberta and University of Guelph, Canada, sibrown@ualberta.ca
Chartrand, James, Open Sky Solutions, Canada, jc.chartrand@gmail.com
Hesemeier, Susan, University of Alberta, Canada, s.hesemeier@ualberta.ca

Introduction

The Canadian Writing Research Collaboratory (CWRC) has developed an in-browser text markup editor called CWRC-Writer for use by collaborative scholarly editing projects. This poster will demonstrate the editor and discuss the named entity annotation features that use stand-off RDF for text annotation. The combination of the poster and demonstration will:

  • Introduce CWRC-Writer so that attendees can try it on their own,
  • Show the hybrid markup model that combines in-text XML and stand-off RDF, and
  • Explain the agile development process followed and recruit testers.

We deliberately propose this as a poster for two reasons. First, we want to provide an opportunity for attendees to try CWRC-Writer. Second, we want to recruit a larger circle of individuals and projects willing to test it with real editing needs.1

CWRC-Writer

CWRC-Writer is an open-source scholarly editor that is undergoing extensive testing and real-world use in scholarly editing projects. CWRC-writer has or will have the following features:

Figure 1

Figure 1: Screen shot of CWRC-Writer showing the tagging options

  • Close-to-WYSIWYG editing and enrichment of scholarly texts with meaningful visual representations of markup
  • Ability to add named entity annotations to texts
  • Ability to combine TEI markup for the text and stand-off RDF for named entities
  • Ability to export using ‘weavers’ that recombine the plain text, the TEI, and the RDF into different forms (including an embedded TEI-compliant XML)
  • Documented code to allow editorial projects to incorporate CWRC-Writer into their environments

Background to CWRC-Writer

The Mashing Texts project, which was funded by the Social Science and Humanities Research Council of Canada, developed a prototype text collection and editing environment (called JiTR) that included a Java Web Start XML editor built on Eclipse by Open Sky Solutions.2 The idea was to test the viability of an easy-to-use editor that could be launched from a web-based collections manager with the text that the user wanted to edit. This meant that users would not have to buy and install a complicated editor. Simultaneously, Open Sky Solutions had been developing a similar XML editor for the Russell Letters project led by Dr. Nicholas Griffin at McMaster University.2 While these editors worked, the time they took to load via the Internet was too long, and developments in browser-based JavaScript editors made it feasible to re-implement the editor as something easy to use in the browser itself.

CWRC, which is funded by the Canada Foundation for Innovation to develop a collaborative editing and publishing environment, is therefore developing an in-browser editor dubbed CWRC-Writer.3 CWRC has built the first usable version of the editor and is working on a second version for the Spring of 2012. The development is led by James Chartrand of Open Sky Solutions. CWRC-Writer is based on TinyMCE, a javascript editor using jQuery to extend functionality.4

Agile Development Process

This project uses an agile development model to develop the editor in close consultation with CWRC partner projects and member projects. As part of the JiTR project, we developed personas and usage scenarios for those personas. CWRC, once funded, then developed specific use case scenarios for the XML editor with wireframes showing how it might be launched (from an editorial environment) and how it might look. Now we are developing this editor in iterations with input from partner projects that use it in their editing or born-digital writing. With the partners we follow an agile process that involves:

  1. Presenting prototypes to the partners with suggestions of what we want tested and where we need suggestions. Susan Hesemeier manages this process.
  2. Summarizing the feedback and prioritizing the next features to be developed. Dr. Rockwell and Dr. Brown do the prioritizing in consultation with the developer.
  3. Responding to queries as Open Sky Solutions develops the next iteration of the prototype.
  4. Initial testing of the prototype by a researcher to address any obvious bugs so as not to waste partner time.
  5. Presenting it back again to the partner participants to be used with their texts. Back to 1.

Each iteration takes about a month and we have completed three. Partner projects are committed to iterative development and have research assistants to help with testing in context.

Partner Projects

The partner projects for the first iterations include:

Orlando Project: This ongoing collaborative experiment in digital women’s literary history has since 1995 involved more than 100 people, many junior scholars, in using a custom semantic tagset based structurally on the TEI but specific to literary history. Orlando’s flagship publication appeared in 2006: Orlando: Women’s Writing in the British Isles is an on-line cultural history generated from the lives and works of over 1200 writers. Orlando continues to produce new materials.

Wilfred Watson and Sheila Watson Projects: The international Editing Modernism in Canada project is producing scholarly print and digital editions of texts by modernist Canadian authors. Through partnerships with several university libraries, University of Alberta Press and CWRC, the EMiC group at the University of Alberta, led by Dr. Paul Hjartarson, is producing digital and print editions of the literary manuscripts of Wilfred and Sheila Watson, who rank among the best late modernist writers in Canada.

Russell Letters Project: Philosopher and social critic Bertrand Russell was one of the twentieth century’s great letter writers and a highly prolific one. His letters are a hugely important resource for philosophers and historians and anyone interested in twentieth-century culture and politics. The Collected Letters of Bertrand Russell project, led by Dr. Nicholas Griffin, is digitizing, transcribing, annotating and indexing more than 40,000 letters from the Russell Archives to create an on-line electronic edition.

Canada’s Early Women Writers Project: Despite the prominence of star authors like Margaret Atwood, little is known of most of Canada’s earlier women writers. This project updates and expands a bio-bibliographical database of 470 Canadian women writers housed at Simon Fraser University. The enlarged semantically-tagged version (of well over 1000 names) will include all notable English-language writers active before 1950 who lived in or wrote about Canada.

Named Entity Annotation

One of the issues flagged early on was that many CWRC partner projects wanted sophisticated annotation for names, places, titles, organization names, dates, citations, and events, as well as great freedom for personal annotation, including overlapping annotations. There is also a desire for interoperability across projects, the coordination of authority lists for these entities, and the ability to harvest some annotations.

The solution is to provide an editing tool that uses a custom in-memory javascript data structure, that can export structural markup that conforms to schemas such as the TEI and Orlando ones, along with Open Annotation Collaboration (OAC) RDF for deeper semantic annotation (using controlled authority lists and vocabulary) of references to people, places, events, organizations, bibliographic material, and dates. CWRC-Writer thus supports creation of structural XML and the RDF in a WYSIWYG environment (WYSIWYG in that the annotations are displayed in the editor as they might appear in the public interface). The formatting of the text within the editor relieves the user of the complexity of interacting directly with RDF formats and even with angle brackets, though both can be viewed on demand and edited by those comfortable with code. There is also provision for exporting enhanced text either as structural markup combined with representations of entities as (potentially overlapping) RDF or, for those who opt for hierarchical markup, entirely as nested tags. The poster will present this in detail with examples.

Authority List Management and Lookup

CWRC-Writer provides forms to look up or construct annotations for people, places, events, organizations, dates, and bibliographic references. The annotations are applied much like formatting is applied in a WYSIWYG editor: the end user highlights the text to be annotated, then clicks on an icon to trigger the annotation lookup or edit form. We are now developing an API so that CWRC-Writer can retrieve a list of recommended entities to present to the user for selection and nominate new entities for inclusion. This is a first step towards a system that can manage people, places, organizations and other entities centrally across multiple projects. In the first instance, this will work with entity data from several CWRC pilot projects and be developed collaboratively with the Watsons project, but we are working towards use of Cool URIs so that CWRC entities can be exposed as and interact with other linked open data.

CWRC’s decision to design an editor that can be used without a full understanding of markup or RDF will undoubtedly be controversial, but we feel such an editor is needed by projects that bring on collaborators for specific tasks who are uninterested in the deeper technology, just as the accessibility of a web-based editor will be useful to many digital projects for light editing, correction, enhancement, and annotation of dynamic collections. We welcome the opportunity to engage in this debate by demonstrating the CWRC-Writer to interested members of the DH community.

References

CKEditor: http://ckeditor.com/

Collected Letters of Bertrand Russell project: http://russell.mcmaster.ca/brletters.htm

Cool URIs: http://www.w3.org/TR/cooluris/

CWRC: Canadian Writing Research Collaboratory: http://www.cwrc.ca/

CWRC-Writer: http://www.cwrc.ca/cwrcwriter

JiTR (Mashing Texts) project: http://tada.mcmaster.ca/Main/MashTexts

jQuery: http://jquery.com

Open Annotation Collaboration: http://www.openannotation.org/

Open Sky Solutions: http://www.openskysolutions.ca/

ORE, a component of the Open Archives Initiative: http://www.openarchives.org/ore/

Orlando Project: http://www.ualberta.ca/orlando and http://orlando.cambridge.org

TEI Lite: http://www.tei-c.org/Guidelines/Customization/Lite/

TinyMCE: http://www.tinymce.com/

Notes

1.For current information and to contact us visit the current CWRC-Writer site: http://www.cwrc.ca/cwrcwriter

2.See http://russell.mcmaster.ca/brletters.htm

3.See http://cwrc.ca

4.For TinyMCE see http://www.tinymce.com/. For jQuery see http://jquery.com.