DARIAH aims to support and enhance digitally-enabled research across the arts and humanities and builds and maintains a research infrastructure for the wider digital humanities community. The project is based on national contributions whereat this paper emerges from the German part DARIAH-DE (DARIAH-EU 2012; DARIAH-DE 2012). In order to develop adequate strategies it is necessary to orientate on existing projects in the humanities. For this reason contact was established with the ‘Virtual Scriptorium St. Matthias’. Interdisciplinary collaboration like this can work both ways: genuine questions of research within humanities regarding duration and change of cultural and medial identities can get new impulses and possibilities through the representation of analog artifacts in the digital medium and projects in the humanities can learn to define their technical requirements in a more precise way to get fit and proper solutions.
The ‘Virtual Scriptorium St. Matthias’ will be an online edition with images of more than 450 medieval codices, mostly written between the eighth and sixteenth century, and a database with information from several manuscript catalogues. Although the supply of the library was dislocated in the time of secularisation most codices remained in the Public Library of Trier and in the library of the Episcopalian Seminary Trier (Becker 1996: 101-103). The latest catalogue of the codices used for the reconstruction can be found in Becker (1996: 66-71, 105-234). The project wants to enable the user to analyse a codex from any place at any time but also to present the codices as an ensemble of medieval writing and reading culture and as an institution of scholarship and knowledge (Embach et al. 2001: 492).
The project is coordinated by the Public Library and Archive of Trier and the Center for Digital Humanities at the University of Trier. It is supported by the German Research Foundation and realised in cooperation with many institutions all over the world that now possess manuscripts from the library and scriptorium of the Benedictine abbey St. Eucharius / St. Matthias in Trier. The digitised content will not only be shown on the project homepage but also on the portals of the TextGrid Repository and Manuscripta mediaevalia (Virtuelles Skriptorium St. Matthias 2011-2012; TextGrid Repository n.d.; Manuscripta Mediaevalia n.d.)
Figure 1: Proposed architecture
Requirements for a Storage Infrastructure
Analysing the current digitisation and access process in Trier several requirements can be formulated to design an architecture fitting to the ‘Virtual Scriptorium St. Matthias’. Due to the distributed manuscripts data ingest and access to the storage resource should be possible worldwide. The images ingested should be replicated and their integrity should be checked regularly to ensure a reliable, long-term storage.
To support the ‘Virtual Scriptorium’ an architecture with different locations is proposed as shown in figure 1. The images are produced and ingested into the system in Trier. An automatic replication places a copy in the local data center to ensure the web view currently offered. Another replication process transfers the data to Karlsruhe, where it is securely stored inside the Large Scale Data Facility (Stotzka et al. 2011). Karlsruhe is the coordinating data center and is responsible for the further replications to all remaining locations. With these procedures the actual number of copies depends on the number of attached locations but at least three copies are provided at every time. Although the ingest process described is handled locally, data can be ingested worldwide using the web. For a huge amount of data it is possible to take advantage of a local storage server and connect it to the storage and replication system. Similar procedures can be taken into account for the access as the data is either accessed via web or via a local copy of the data which was replicated from Karlsruhe to the specific location. The replication process itself and the data integrity checks have to be provided by a software for storage virtualisation which is needed because several servers have to be dealt with and which offers the opportunity to extend the system as needed by the users of the ‘Virtual Scriptorium’ by attaching additional servers.
Figure 2: Implementation of the first iRods zone
Although different software solutions are theoretically possible to realise the architecture, iRODS with its overall flexible structure seems to provide the most convenient mechanisms to deal with the humanistic research data of the ‘Virtual Scriptorium St. Matthias’ (iRods 2012). In this implementation two iRODS zones are used to realise a distributed, reliable storage resource. The responsibilities for data inside a zone and for the replications needed are clearly assigned to this specific zone. Additionally the overall performance is improved by using two data bases.
The data is ingested into the ‘DARIAH-Trier’ zone (figure 2) using the iRODS Explorer or the command line tool iCommands with the iRODS specific transfer protocol. The replication to the local data center in Trier resp. to Karlsruhe, which are located outside ‘DARIAH-Trier’, is realised with iRODS rules which are triggered by a data ingest and then activate shell scripts to replicate via scp resp. iCommands.
The replication inside the second zone ‘DARIAH-MUSE’ (figure 3) is also realised with iRODS rules, which copy new data to every server of the zone and synchronises the servers attached. Mean transfer rates of 9 MB/s (30 MB/s, 11 MB/s) could be achieved in the first transfers between Karlsruhe and Trier (Trier and local data center Trier, Trier and Karlsruhe). Access to both zones is possible using various iRODS clients. Additionally to the mechanisms described bit preservation services are executed. An MD5 checksum is computed while uploading a file, stored inside the iRODS system and checked while downloading a file. In Karlsruhe the checksum is recomputed once a month and a corrupted file is replaced with a valid copy if the values mismatch.
Figure 3: Implementation of the second iRODS zone
An initial implementation was realised with the locations Trier and Karlsruhe and will be extended in the near future. Up to now several codices were successfully ingested and replicated. Because of the humanistic context of this implementation several challenges had to be dealt with. Most of the features are highly automated to be as easy to use as possible for the humanistic researcher. The ‘Virtual Scriptorium St. Matthias’ provides a large amount of files which have to be stored. This issue is faced by using a scalable data management system and by building a distributed system to increase efficiency and reliability. As humanistic researchers rather think in centuries than in years, the system provided has to be longterm available and sustainable. By integration of the implementation to the DARIAH infrastructure this demand is approached but a comprehensive solution is still under research.
Additionally, the architecture has a generic design to be able to provide a storage resource for other humanistic projects as well. This feature is necessary for the infrastructure the DARIAH project aims to build. The implementation described is therefore a basic but nevertheless fundamental component as it ensures preservation on bit level and is a basis for further on-going developments.
This work has been supported by DARIAH-DE which is partially funded by the German Federal Ministry of Education and Research (BMBF) under the D-Grid initiative by agreement 01UG1110A-M.
The work has also been supported by the KIT startup budget for the ‘Build-up of an Experimental Research Data Repository (e-Repos)’.
Becker, P. (1996). Die Benediktinerabtei St. Eucharius – St. Matthias vor Trier. Berlin, New York: de Gruyter.
DARIAH-DE (2012). Available from: http://www.de.dariah.eu. (Accessed 16 March 2012).
DARIAH-EU (2012). Available from: http://www.dariah.eu (Accessed 16 March 2012).
Embach, M., C. Moulin, and A. Rapp (2011). Die mittelalterliche Bibliothek als digitaler Wissensraum. Zur virtuellen Rekonstruktion der Abteibibliothek von Trier-St. Matthias. In R. Plate, and M. Schubert (eds,), Mittelhochdeutsch. Beiträge zur Überlieferung, Sprache und Literatur. Festschrift für Kurt Gärtner zum 75. Geburtstag. Berlin, Boston: de Gruyter, pp. 486-497.
iRods (2012). Available from: https://www.irods.org/. (Accessed 16 March 2012).
Manuscripta mediaevalia (n.d.) Available from: http://www.manuscripta-mediaevalia.de/. (Accessed 16 March 2012).
Stotzka, R., V. Hartmann, T. Jejkal, M. Sutter, J. van Wezel, M. Hardt, A. Garcia, R. Kupsch, and S. Bourov (2011). Perspective of the Large Scale Data Facility (LSDF) Supporting Nuclear Fusion Applications. Proceedings of the 19th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Ayia Napa: IEEE Press, pp. 373-379.
TextGrid Repository (n.d.) Available from: http://www.textgridrep.de/. (Accessed 16 March 2012).
Virtuelles Skriptorium St. Matthias 2011-2012. Available from: http://www.stmatthias.uni-trier.de. (Accessed 16 March 2012).