No source: created in electronic format.
Over the last five years, several national and supranational funding bodies have invested in large digital humanities infrastructure projects designed, in part, to reduce the proliferation of projects inventing and reinventing small, commonly needed tools. During the same period, several very modestly funded projects have attempted to achieve the same results by holding working meetings – called ‘code sprints’ – to develop commonly needed tools collaboratively. Interedition, The Center for History and New Media’s One Week | One Tool, MITH’s XML Barn-Raising, and New York Public Library’s Tilden Papers Project have each hosted a series of code camps (also known as boot camps or code sprints) that bring together groups of humanities scholars with serious programming expertise to spend a week working on a tool of value to each of them. The process has proven to be so productive that it has recently been adopted by one of the large infrastructure projects, Project Bamboo, in ‘CorporaCamp.’ In this panel, we will examine the advantages and limitations of the small, agile methods of code sprints and how they may be supported by the large, sustainable infrastructures currently under construction by larger projects.
The advantages of codesprints are numerous. By spending time in rapid prototyping; by gathering scholars who are also coders (rather than those with only one skill or the other); and by establishing a policy of, to quote Dave Lester, ‘More Hack, less yak!’, worksprints quickly determine the real challenges facing academic software development and often make significant headway towards solving them. ‘One week, one tool’ produced Anthologize which continues under active development and use to this day; Interediton has produced, among other things, CollateX (a modular automated collation workflow); the MITH barn raising produced a prototype of a web-based XML editor: ANGLES; and NYPL’s codesprint significantly refactored the code in the Internet Archive’s BookReader tool and prepared it for future extensions by the participants. These sprints not only build tools, they lay the groundwork and point the way for the sort of infrastructure that is truly needed.
Nonetheless, these sprints occasionally encounter problems. Setting up code-sharing mechanisms, installing needed dependencies, and teaching coding dialects such as jQuery or Node.js to participants unfamiliar with them often absorbs a full day of work. Differing expectations and desires by participants can sometimes (though surprisingly rarely) threaten to derail progress. Documenting work so that it can be taken up later or by others is sometimes not prioritized to the extent it should be. Further, although there is often much to do at the end of the sprints, participants frequently find it difficult to continue working when they return home as competing and more immediate priorities take precedence. It is possible that large infrastructure projects such as Bamboo, Dariah, and TextGrid could provide the organizational and administrative infrastructure needed to make codesprints more effective and their work more sustainable.
This paper will discuss four code sprints (outlined below), recount ‘lessons learned’, and discuss how big infrastructure projects and light-weight, rapid development efforts such as these may support each other.
Sample case studies:
MITH ‘Barn Raising’:
In 2010, Doug Reside (Digital curator for the
Performing Arts, NYPL) organized a ‘barn raising’ to produce a web-based XML
editor in his last weeks at the Maryland Institute for Technology in the
Humanities. The event brought several participants from Canada and elsewhere in
the United States to College Park, Maryland, but also organized a group of
coders to participate remotely via Skype and IRC. After the first two days of
the sprint, the group divided into two groups, one concerned with building a
WYSIWIG editor and one wanting to replicate the core functionality of popular
XML editors in a JavaScript-based web application. The second group consisted
almost entirely of remote participants, but arguably built more code in the
course of the sprint than those working together in the same room.
BookReader Sprint: In 2011, Benjamin
Vershbow organized New York Public Library’s worksprint to extend the
Internet Archive’s JavaScript widget, BookReader. The goal of the project was to
refactor the Internet Archive’s existing code to make it more modular and
extensible. Around a dozen participants were brought to New York from libraries
around North America for four days of hacking on the code base. In an attempt to be
open to the many desired use-cases of the BookReader, NYPL invited participants with
a diverse set of skills and desires which, in retrospect, probably dispersed some of
the productive energy of the sprint into efforts that could not be realistically
achieved in the course of a few days. However, the sprint is notably in that it
began to extend an existing, heavily used, code base supported by a major
organization.
Interedition: Over the past three years, Joris van
Zundert organized ten boot camps as part of European Cost Action IS0704
‘Interedition’. The boot camps varied in participation from 5 to 15 scholars,
developers, and scholarly developers from the wider European region and the US. The
boot camps focused transcription, annotation, and collation as primary scholarly
tasks in producing (digital) scholarly editions that could effectively be supported
by common models and tools. The Interedition boot camps have resulted in various new
tools in the form of web services – of which CollateX probably is best known – and
considerable progress of development of existing tools (Juxta, eLaborate etc.).
However, the production of tools is paradoxically ‘just’ a side effect of the
Interedition endeavor. Interedition’s main objective is furthering the
interoperability of tools used in the production of scholarly editions as a means of
enhancing the sustainability of both tools and digital editions. One of
Interedition’s findings was that it is pivotal to such sustainability that there
must be an academic platform supporting researcher-developers’ interaction and
collaboration in a most concrete way: interoperability and integration of tool
development is best done together.
Corpora Camp: Neil Fraistat and Seth Denbo
organized Corpora Camp as part of project Bamboo. CorporaCamp was a key step in
the design process for Project Bamboo’s Corpora Space, which will enable the
curation and exploration of data across the boundaries of large structured
collections. The primary goal of Corpora Camp was to see if over the space of
three days participants could make a prototype tool for visualization and
analysis function across three different collections. While the ‘work’ of the
workshop involved building this tool – which we called WoodChipper – the tool
itself was only one of several important outcomes. CorporaCamp not only tested
our assumptions about the larger design process for Bamboo Corpora Space, but
the rapid development process of the workshop required us constantly to balance
our long-term goals – experimenting with a distributed, extensible architecture
– against our desire to have a working prototype implemented at the end of the
three days. In many cases the team had two development threads running in
parallel, with one group working on a more general solution and another on a
simpler fallback. This process provide a better sense of the problems and
decisions – and the range of consequences of those decisions – that would be
faced with in developing Bamboo architecture and applications.