Managing entities like people, places and subjects across a large corpus of textual documents can be complicated. While the TEI guidelines offer a sound basis for the encoding of a great variety of textual material, there does not seem to be a general agreement on how to manage information that goes beyond the text, like entity information and relationships between entities.
At the Department of Digital Humanities at King’s College London, past solutions for entity management have included the implementation of the following:
But all of these seemed either to be too simplistic and hard to maintain, or too complicated for the requirements we had in the Gascon Rolls project.2
In this abstract we will demonstrate how we successfully applied the entity management tool EATS (Entity Authority Tool Set),3 in the context of this project, to successfully manage entity information.
The Gascon Rolls Project (1317-1468) is an AHRC-funded collaborative venture between the Universities of Oxford and Liverpool, and the Department of Digital Humanities at King’s College, London. It began in October 2008, and is due to end in December 2011.
The main aim is to make the unpublished records of the English Government of the Duchy of Aquitaine (1154-1453)4 available to everyone, both in electronic and printed forms.
The corpus of the Gascon Rolls consists of one hundred and twelve rolls containing up to 67 membranes each containing enrolments by the English royal Chancery of letters, writs, mandates, confirmations, inspeximuses, and other documents issued by, and in the name of, the Plantagenet and Lancastrian king-dukes for their Gascon lands and subjects. The rolls contain a large set of information about people and places, in particular, that need to be harvested in order to offer sophisticated indexes and advanced search functions which can give a different research experience to the scholar approaching the online resource. At the time of writing there are 4381 people entities and 2842 place entities in EATS, and there is no issue with having many thousands more. By comparison, the EATS installation at the New Zealand Electronic Text Centre contains almost a hundred thousand entities, drawn from thousands of electronic texts.5
The open access online resource offers images of all the unpublished Gascon Rolls (1317-1468), an edition, in calendar (summary) form, indexes of person, places and subjects mentioned, and advanced search features.
The calendar editing framework is based largely on what has been previously developed,6 7 for the Fine Rolls of Henry III project.8 It uses a customised subset of the TEI P5 guidelines;9 these have been adapted to suit the particular needs of the structural variety presented by the Gascon Rolls.
Calendars are encoded directly in XML. Information is captured about structure (rolls, membranes, entries, openers and closers), dates, people and their offices, places and subjects.
This allows the creation of a number of displays, in both electronic and printed formats, and forms a basis for the construction of indexes and search facilities.
Information about people and their offices, places and subjects are encoded in-line, extracted from the text using a custom-built oXygen XML Editor plugin,10 and added to EATS, where each entity is given a unique identifier (URI).
Figure 1: Graphic display of the Gascon Rolls editorial framework
EATS is a web application for recording, editing, using and displaying authority information about entities.11 It is designed to allow multiple authorities to each maintain their own independent data, while operating on a common base so that information about the same entity is all in one place. It can be accessed by multiple users simultaneously, allowing the research team members to work independently and collaboratively at the same time.
By using a central entity management system to capture information about people, places and subjects mentioned in the rolls, it is possible to track a single entity throughout all the rolls and establish relationships between the entities.
Relationships, along with the variant names of entities, are crucial elements in entity management, since they are not amenable to discovery through simple search interfaces. The Gascon Rolls project harvests relationships from two sources: those explicitly created in EATS by the researchers, and those that can be inferred from TEI markup within the texts. The former encompasses information that is not present in the text, or which is not bounded by time or circumstance; for example, familial relationships. The latter derives from nested name markup, for example specifying that a person is from a particular place.
The publishing framework created at the Department of Digital Humanities at King’s College London, xMod,12 generates the output for the different calendar reading views and for the single entity pages which are the basis for online and printed indexes. The editorial framework and the entity management system together form a flexible structure that can be reapplied for the online and printed publication of other historical sources of the same nature.
One of the advantages of using EATS, as opposed to the other entity management possibilities mentioned above, is the possibility to access EATS directly from the XML editor, by using the plugin, which reduces the time it would take to add or edit information, compared to having to use a separate tool. Another advantage is that, due to EATS being a database at its core, the information that it stores can easily be exported into other formats as desired, such as RDF/OWL and Topic Maps serialisations.
Further, since the EATS installation is not tightly coupled to the other components of the Gascon Rolls project – that is, it does not reference them, but is only referenced – its information can be easily reused in other projects. Indeed, such projects could use the same installation and add their own information about entities, without causing any changes to the information used by the Gascon Rolls project. Data can be shared without being intermingled.
There are some downsides to the current version of EATS, however. When adding more complex information about an entity, like relationships, the users found it complicated to use at first. It is not easy to add some rich information, like tertiary relationships (Person A related to Person B in Place C), although this was never a problem in our case. It is not possible to define entity type hierarchies; for instance County can’t be declared as a subtype of Place. Such a facility would be extremely useful when trying to classify entities and to create an overview of the total number of entities of an overarching type.
The approach taken to entity management in the Gascon Rolls project has so far served it well, and is well suited to similar undertakings.
1.Ciula, A., P. Spence, and M. Vieira (2008). Expressing complex associations in the medieval historical documents: the Henry III Fine Rolls Project. Journal of the Association for Literaty and Linguistic Computing and the Association for Computers and the Humanities 23( 3).
4.The Gascon Rolls (Rotuli Vasconie – C61 class in the UK National Archives) relate to the English Government of the Duchy of Aquitaine (1154-1453) and include information on the Hundred Years War, concluding in 1453 with the end of the Anglo-Gascon union.
6.Ciula, A. (2006). Searching the Fine Rolls: A Demonstration of the Electronic Version. Paper presented at the International Medieval Congress 2006, University of Leeds, July 10-13.
7.Spence, P. (2006). The Henry III Fine Rolls Project.Digital Humanities 2006. The First ADHO International Conference: Conference Abstracts. Universite Paris-Sorbonne.
11.Stevenson, A., and J. Norrish (2008). Topic Maps and Entity Authority Records: An Effective Cyber Infrastructure for Digtal Humanities. Paper presented at the Digital Humanities 2008 conference. Oulu, Finland, June 25-29, 2008.