Print Friendly

Reside, Doug, New York Public Library for the Performing Arts, USA, dougreside@gmail.com
Fraistat, Neil, Maryland Institute for Technology in the Humanities (MITH), University of Maryland, USA, nfraistat@gmail.com
Vershbow, Ben, New York Public Library, USA, benjaminvershbow@nypl.org
van Zundert, Joris Job, Huygens Institute for the History of The Netherlands, The Netherlands, joris.van.zundert@gmail.com

Over the last five years, several national and supranational funding bodies have invested in large digital humanities infrastructure projects designed, in part, to reduce the proliferation of projects inventing and reinventing small, commonly needed tools.  During the same period, several very modestly funded projects have attempted to achieve the same results by holding working meetings – called ‘code sprints’ – to develop commonly needed tools collaboratively. Interedition, The Center for History and New Media’s One Week | One Tool, MITH’s XML Barn-Raising, and New York Public Library’s Tilden Papers Project have each hosted a series of code camps (also known as boot camps or code sprints) that bring together groups of humanities scholars with serious programming expertise to spend a week working on a tool of value to each of them. The process has proven to be so productive that it has recently been adopted by one of the large infrastructure projects, Project Bamboo, in ‘CorporaCamp.’ In this panel, we will examine the advantages and limitations of the small, agile methods of code sprints and how they may be supported by the large, sustainable infrastructures currently under construction by larger projects.

The advantages of codesprints are numerous. By spending time in rapid prototyping; by gathering scholars who are also coders (rather than those with only one skill or the other); and by establishing a policy of, to quote Dave Lester, ‘More Hack, less yak!’, worksprints quickly determine the real challenges facing academic software development and often make significant headway towards solving them. ‘One week, one tool’ produced Anthologize which continues under active development and use to this day; Interediton has produced, among other things, CollateX (a modular automated collation workflow); the MITH barn raising produced a prototype of a web-based XML editor: ANGLES; and NYPL’s codesprint significantly refactored the code in the Internet Archive’s BookReader tool and prepared it for future extensions by the participants. These sprints not only build tools, they lay the groundwork and point the way for the sort of infrastructure that is truly needed.

Nonetheless, these sprints occasionally encounter problems. Setting up code-sharing mechanisms, installing needed dependencies, and teaching coding dialects such as jQuery or Node.js to participants unfamiliar with them often absorbs a full day of work. Differing expectations and desires by participants can sometimes (though surprisingly rarely) threaten to derail progress. Documenting work so that it can be taken up later or by others is sometimes not prioritized to the extent it should be. Further, although there is often much to do at the end of the sprints, participants frequently find it difficult to continue working when they return home as competing and more immediate priorities take precedence. It is possible that large infrastructure projects such as Bamboo, Dariah, and TextGrid could provide the organizational and administrative infrastructure needed to make codesprints more effective and their work more sustainable.

This paper will discuss four code sprints (outlined below), recount ‘lessons learned’, and discuss how big infrastructure projects and light-weight, rapid development efforts such as these may support each other.

Sample case studies:

MITH ‘Barn Raising’: In 2010, Doug Reside (Digital curator for the Performing Arts, NYPL) organized a ‘barn raising’ to produce a web-based XML editor in his last weeks at the Maryland Institute for Technology in the Humanities. The event brought several participants from Canada and elsewhere in the United States to College Park, Maryland, but also organized a group of coders to participate remotely via Skype and IRC. After the first two days of the sprint, the group divided into two groups, one concerned with building a WYSIWIG editor and one wanting to replicate the core functionality of popular XML editors in a JavaScript-based web application. The second group consisted almost entirely of remote participants, but arguably built more code in the course of the sprint than those working together in the same room.

BookReader Sprint: In 2011, Benjamin Vershbow organized New York Public Library’s worksprint to extend the Internet Archive’s JavaScript widget, BookReader. The goal of the project was to refactor the Internet Archive’s existing code to make it more modular and extensible. Around a dozen participants were brought to New York from libraries around North America for four days of hacking on the code base. In an attempt to be open to the many desired use-cases of the BookReader, NYPL invited participants with a diverse set of skills and desires which, in retrospect, probably dispersed some of the productive energy of the sprint into efforts that could not be realistically achieved in the course of a few days. However, the sprint is notably in that it began to extend an existing, heavily used, code base supported by a major organization.

Interedition: Over the past three years, Joris van Zundert organized ten boot camps as part of European Cost Action IS0704 ‘Interedition’. The boot camps varied in participation from 5 to 15 scholars, developers, and scholarly developers from the wider European region and the US. The boot camps focused transcription, annotation, and collation as primary scholarly tasks in producing (digital) scholarly editions that could effectively be supported by common models and tools. The Interedition boot camps have resulted in various new tools in the form of web services – of which CollateX probably is best known – and considerable progress of development of existing tools (Juxta, eLaborate etc.). However, the production of tools is paradoxically ‘just’ a side effect of the Interedition endeavor. Interedition’s main objective is furthering the interoperability of tools used in the production of scholarly editions as a means of enhancing the sustainability of both tools and digital editions. One of Interedition’s findings was that it is pivotal to such sustainability that there must be an academic platform supporting researcher-developers’ interaction and collaboration in a most concrete way: interoperability and integration of tool development is best done together.

Corpora Camp: Neil Fraistat and Seth Denbo organized Corpora Camp as part of project Bamboo. CorporaCamp was a key step in the design process for Project Bamboo’s Corpora Space, which will enable the curation and exploration of data across the boundaries of large structured collections. The primary goal of Corpora Camp was to see if over the space of three days participants could make a prototype tool for visualization and analysis function across three different collections. While the ‘work’ of the workshop involved building this tool – which we called WoodChipper – the tool itself was only one of several important outcomes. CorporaCamp not only tested our assumptions about the larger design process for Bamboo Corpora Space, but the rapid development process of the workshop required us constantly to balance our long-term goals – experimenting with a distributed, extensible architecture – against our desire to have a working prototype implemented at the end of the three days. In many cases the team had two development threads running in parallel, with one group working on a more general solution and another on a simpler fallback. This process provide a better sense of the problems and decisions – and the range of consequences of those decisions – that would be faced with in developing Bamboo architecture and applications.