The Sharing Ancient Wisdoms (SAWS)1 project explores and analyses the tradition of wisdom literatures in ancient Greek, Arabic and other languages, by presenting the texts digitally in a manner that enables linking and comparisons within and between anthologies, their source texts, and the texts that draw upon them. We are also creating a framework through which other projects can link their own materials to these texts via the Semantic Web, thus providing a ‘hub’ for future scholarship on these texts and in related areas. The project is funded by HERA (Humanities in the European Research Area) as part of a programme to investigate cultural dynamics in Europe, and is composed of teams at the Department of Digital Humanities and the Centre for e-Research at King’s College London, The Newman Institute Uppsala in Sweden, and the University of Vienna.
Throughout antiquity and the Middle Ages, anthologies of extracts from larger texts containing wise or useful sayings were created and circulated widely, as a practical response to the cost and inaccessibility of full texts in an age when these existed only in manuscript form (Rodríguez Adrados 2009: 91-97 on Greek models; Gutas 1981). SAWS focuses on gnomologia (also known as florilegia), which are manuscripts that collected moral or social advice, and philosophical ideas, although the methods and tools developed are applicable to other manuscripts of an analogous form (e.g. medieval scientific or medical texts; Richard 1962).
The key characteristics of these manuscripts are that they are collections of smaller extracts of earlier works, and that, when new collections were created, they were rarely straightforward copies. Rather, sayings were reselected from various other manuscripts, reorganised or reordered, and subtly (or not so subtly) modified or reattributed. The genre also crossed linguistic barriers, in particular being translated into Arabic, and again these were rarely a matter of straightforward translations; they tend to be variations. In later centuries, these collections were translated into western European languages, and their significance is underlined by the fact that Caxton’s first imprint (the first book ever published in England) was one such collection (Craxton  1877]. Thus the corpus of material can be regarded as a very complex directed network or graph of manuscripts and individual sayings that are interrelated in a great variety of ways, an analysis of which can reveal a great deal about the dynamics of the cultures that created and used these texts.
The SAWS project therefore has three main aspects:
Each of the texts is being marked up in TEI-conformant XML and validated to a customised schema designed at King’s College London for the encoding of gnomologia. Our structural markup reflects as closely as possible the way in which the scribe laid out the manuscript. The TEI schema uses the <seg> element to mark up base units of intellectual interest (not necessarily identified as single units by the scribe), such as a saying (statement) together with its surrounding story (narrative). For example:
Alexander, asked whom he loved more, Philip or Aristotle, said: ‘Both equally, for one gave me the gift of life, the other taught me to live the virtuous life.’2
This contains both a statement and a narrative:
Alexander, asked whom he loved more, Philip or Aristotle, said:
Both equally, for one gave me the gift of life, the other taught me to live the virtuous life.
Each of these <seg> elements can be given an @xml:id to provide a unique identifier (which can be automatically generated) that differentiates them from all other examples of <seg>, for instance <seg type=”statement” xml:id=”K.al-Haraka_ci_s1″>. In other words, it allows each intellectually interesting unit (as identified by our team’s scholars) to be distinguished from each other unit, thus providing the means of referring to a specific, often very brief, section of the text.
Several types of relationships have been identified within and between the manuscripts. These manuscript relationships exist at many different levels of granularity, from links between individual sayings to interconnections in families of manuscripts. These relationships are represented using an ontology that extends the FRBR-oo model (Doerr & LeBoeuf 2007) (the harmonisation of the FRBR model of bibliographic records (Tillett 2004) and the CIDOC (Doerr 2003) Conceptual Reference Model (CIDOC-CRM)). The SAWS3 ontology, developed through collaboration between domain experts and technical observers, models the classes and links in the SAWS manuscripts. Basing the SAWS ontology around FRBR-oo provides most vocabulary for both the bibliographic (FRBR) and cultural heritage (CIDOC) aspects being modelled.
Using this underlying ontology as a basis, links between (or within) manuscripts can be added to the TEI documents using RDF markup. RDF4 is a specification to represent one-way links (known as triples) from an object entity to a subject entity. The use of RDF to represent relationships in markup has to date primarily been seen in XHTML documents containing RDFa5 (RDF in attributes); however it is desirable to extend the scope of RDF in markup, so that RDF links can be added directly into documents such as the TEI XML documents used in SAWS (Jewell 2010).
To include RDF triples in TEI documents, three entities have to be represented for each triple: the subject being linked from, the object being linked to, and a description of the link between them. The subject and object entities in the RDF triple are represented by the @xml:id that has been given to each of the TEI sections of interest. We use the TEI element <relation/> (recently added to TEI) to place RDF markup in the SAWS documents, with four attributes as follows:
This is equivalent to stating that the Arabic segment identified as ‘K._al-Haraka_ci_s5’ is a close rendering of the Greek segment identified as ‘Proclus_ET_Prop.17_ci1’, and that this relationship has been asserted by Elvira Wakelnig, 2012.
The <relation/> element can be placed anywhere within the XML document, or indeed in a separate XML document if required: for our own purposes we have found it useful to place it immediately after the closing tag of the <seg> identified as the ‘active’ entity.
The project is thus producing a framework for representing these relationships, using an RDF-based semantic web approach, as well as tools for creating these complex resources, and for visualising, analysing, exploring and publishing them (Heath & Bizer 2011). We are engaging with scholars working on related projects in order to establish an agreed set of predicates; the version currently being evaluated and deployed by scholars is available as an ontology athttp://purl.org/saws/ontology . The number of manuscripts of this type is large, and the project is creating the kernel of an envisaged much larger corpus of interrelated material. Many of the subsequent contributions will be made by others; therefore we are creating a framework of tools and methods that will enable researchers to add texts and relationships of their own, which will be managed in distributed fashion. We are also linking to existing Linked Data sources about the ancient world, most notably the Pleiades gazetteer6 of ancient places and the Prosography of the Byzantine World7 which aims to document all the individuals mentioned in textual Byzantine sources from the seventh to thirteenth centuries.
Thus we will create an interactive environment that enables researchers not only to search or browse this material in a variety of ways, but also to process, analyse and build upon the material. This environment will provide tools to browse, search and query the information in the manuscripts, as well as making available SAWS-specific TEI schema and ontology files and XSLTs used to extract semantic information from the structural markup and metadata within the
TEI markup. We also hope to make available tools for adding and editing manuscripts within the SAWS environment. The ultimate aim is to create a network of information, comprising a collection of marked-up texts and textual excerpts, which are linked together to allow researchers to represent, identify and analyse the flow of knowledge and transmission of ideas through time and across cultures.
Caxton, W. ( 1877). The Dictes and Wise Sayings of the Philosophers (originally published London, 1477), reprinted 1877 (London: Elliot Stock).
Doerr, M. (2003). The CIDOC CRM – an Ontological Approach to Semantic Interoperability of Metadata. AI Magazine 24(3).
Doerr, M., and P. LeBoeuf (2007). Modelling Intellectual Processes: The FRBR – CRM Harmonization. Digital Libraries: Research and Development, Vol. 4877. Berlin: Springer, pp. 114-123.
Gutas, D. (1981). Classical Arabic Wisdom Literature: Nature and Scope. Journal of the American Oriental Society 101(1): 49-86.
Heath, T., and C. Bizer (2011). Linked Data: Evolving the Web into a Global Data Space. San Rafael, CA: Morgan & Claypool, pp.1-136.
Jewell, M. O. (2010). Semantic Screenplays: Preparing TEI for Linked Data http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/pdf/ab-878.pdf/Digital Humanities 2010, Friday 9 July, London, UK.
Pleiades gazetteer:http://pleiades.stoa.org/ Last accessed 9th March 2012.
Prosopography of the Byzantine World:http://www.pbw.kcl.ac.uk . Last accessed 9th March 2012.
RDF/XML Syntax Specification (Revised) http://www.w3.org/TR/rdf-syntax-grammar/. Last accessed 31st October 2011.
RDFa in XHTML: Syntax and Processing. http://www.w3.org/TR/rdfa-syntax/. Last accessed 31st October 2011.
Richard, M. (1962). Florilèges grecs. Dictionnaire de Spiritualité V, cols. 475-512.
Rodríguez Adrados, F. (2009). Greek wisdom literature and the Middle Ages: the lost Greek models and their Arabic and Castilian. Bern: Peter Lang.
Tillett, B. (2004). What is FRBR? A Conceptual Model for the Bibliographic Universe. Library of Congress Cataloging Distribution Service, Library of Congress 25, pp. 1-8.