No source: created in electronic format.
This poster will show how the TextGrid Repository assists researchers in the curation of their data and how they can make the data available with TextGrid tools to foster scientific re-use. The increasing importance of digitally-aided research methods has caused exponential growth in the creation of research data. New research methods and collaborative ways of using data sets require sophisticated research infrastructures to support researchers in the Digital Humanities and to enable the re-use of existing data. Data curation must be included in the planning stage as a fundamental requirement for all projects dealing with sustainable data. Therefore, this poster will present an overview of the technical infrastructure and the applicability of the TextGrid Repository for humanities researchers.
The TextGrid Virtual Research Environment (VRE), funded by the German Federal Ministry of Education and Research, provides tools, data and services in one integrated interface and supports the long-term archiving and management of research data. It provides a platform for researchers in the Arts and Humanities to curate their data that reflects universally-recognized best practices and standards. TextGrid consists of two main components: the TextGrid Laboratory (TextGridLab), the entry point to the VRE, and the TextGrid Repository (TextGridRep), a long-term humanities data archive. To preserve and maintain research data and ensure its long-term viability, current research practices in all stages of the research lifecycle must be supported. Therefore, the TextGridLab provides common functionalities in a sustainable environment to facilitate the re-use of data, services, and tools, and the TextGridRep enables researchers to publish and share their data in a way that supports long-term availability and re-usability. Rather than acquiring the technical knowledge necessary for data curation themselves, researchers can make use of services and guidelines for long-term data accessibility and sustainability during the initial planning stages of their projects through the TextGrid VRE.
After five years of research and development, TextGrid released a stable,
operational version 1.0 in July 2011 and will release a version 2.0 in May
2012.
These projects create significant amounts of data during the research process that require curation. This poster will show how the TextGridRep assists researchers in the curation of their data and in ensuring persistent access to data with TextGrid tools to support scientific re-use.
The first section of the poster will give an overview of the technical
functionalities and infrastructure of the TextGridRep, which has been fully
operational since July 2011. The TextGridRep provides a repository
infrastructure based on grid technology. Researchers can decide how and with
whom their data will be shared by using the detailed rights management module.
Findings and research data can be published directly from the TextGridLab in the
repository via a publishing process that guides researchers in preparing the
data for long-term accessibility. The middleware consists of various components
for handling files in the data grid, rights management in a role-based access
control-enabled database, metadata in an XML database, and relations in a
Resource Description Framework (RDF) triple store. On a basic level, TextGrid
will offer bitstream preservation with redundant grid storage and tape backup
for 10 years (as recommended in the guidelines of the German Research
Foundation).
When researchers publish their research data via the TextGridLab in the repository, the metadata provided will be automatically validated. The system validates against the TextGrid object model and checks if obligatory metadata fields like rights owner and license are well defined. In the next step, persistent identifiers are allocated by using a reliable handle service that is provided by the center for scientific data processing in Göttingen, GWDG, which is a main developing partner in the European Persistent Identifier Consortium and functions as the computer centre for the Max Planck Society. As part of the publishing process, the data will be frozen and moved to a storage cluster used for long-term preservation. If researchers want to update their data, they can copy it to their workspace, correct or further annotate the data, and publish the data as a new revision that is linked to the old revision. Both revisions will be available but the newer one will be more prominent in search results. The grid storage for the humanities and all connected resources are maintained together with those from the other academic disciplines at the common Grid Resource Centre in Göttingen (which has allotted 275 terabytes for the humanities).
The second section of the poster will show how researchers can make their data available with the TextGrid Repository. There are currently three different ways for research groups to enable access to their data in the repository:
1) All published data is available via the TextGridRep portal, which is already
in place. It enables rapid searching with both simple and advanced search
capabilities, in addition to the option of browsing repository content, across
public research data with fulltext and metadata indexes.
2) Research groups who create a digital edition often want to present their data in their own portal with specific graphics, labels, and predefined browse and search options. Therefore, an open REST interface for individual portal solutions is provided so that research groups may provide specific elaborated access to their research collections with common technologies like Javascript, CSS, HTML.
3) Research groups who want to provide complex customized visualizations and
complex project-specific search queries for their digital editions often use
their own database for their project that is not connected to any long-term
archiving solutions. We are developing a straightforward and easy way to sync
the data stored in the TextGridRep (for long-term access) with a
project-specific XML database (for the project-specific representation of the
digital edition). A prototype is already in place that enables users to publish
data from the TextGridRep to any eXist database with drag & drop
functionality. Users can continuously test the representation of their XML data
(e.g., TEI-formatted) in their own environment while they are still working on
the digital edition with TextGrid. This allows research projects to annotate
their data and develop the representation with XSLT and XQuery scripts at the
same time. They can also easily publish new revisions of their data through
TextGrid in the TextGridRep as well as in their own environments. TextGrid will
provide a TextGrid-specific XQuery-module for the eXist XML database via the
newly announced eXist AppRepository.
XML technologies like XQuery and XSLT support the representation of digital editions using a common standard that promotes long-term reusability of and reliable access to research data. Therefore TextGrid facilitates the publication of digital editions in ways that are both easy to use and encourage the use of established best practices.
TextGrid maintains a strong network with other professional associations and
eHumanities centres as well as with research infrastructure initatives and
projects both nationally (DARIAH-DE,
Funding
The initial funding phase by the German Federal Ministry of Education and Research (BMBF) lasted from February 2006 to May 2009 (BMBF reference number 07TG01A-H). The second funding phase covered the period from 1 June 2009 to 31 May 2012 (BMBF reference number: 01UG0901A).