Approaches to the Treatment of Primary Materials in Digital Lexicons: Examples of the New Generation of Digital Lexicons for Buddhist Studies

Home » conference » programme » abstracts » Approaches to the Treatment of Primary Materials in Digital Lexicons:…

XML

authors & presenters

Nagasaki, Kiyonori, International Institute for Digital Humanities, Japan, nagasaki@dhii.jp

Tomabechi, Toru, International Institute for Digital Humanities, Japan, tomabechi@dhii.jp

Wangchuk, Dorji, University of Hamburg, Germany, dorji.wangchuk@uni-hamburg.de

Takahashi, Koichi, University of Tokyo, Japan, snb44191@nifty.ne.jp

Wallman, Jeff, Tibetan Buddhist Resource Center, New York, USA, jeffwallman@tbrc.org

Muller, A. Charles, University of Tokyo, Japan, acmuller@jj.em-net.ne.jp

Introduction

Recently, several projects have emerged with the shared aim of creating online digital lexicons for Buddhist studies. There are many possibilities in the creation of digital lexicons for Buddhist studies, since the Buddhist religion itself is so extensively multilingual and multicultural, and also has an unusually broad variety and number of readers. As seen in Father Roberto Busa’s initiation of his Thomas Aquinas lexicon project in 1949, the digital lexicon has been a basic resource in the digital humanities. The field of Buddhist studies was one of the first to have its own newly-created comprehensive digital lexicons, entitled ‘Digital Dictionary of Buddhism (DDB; http://www.buddhism-dict.net/ddb).’ Under continuous development on the web for more than 15 years, it has been one of the successful examples of the creation of new online reference resources from various technical and scholarly perspectives. However, recent developments in the area of ICT have given those who aim to make other type of digital lexicons new opportunities to attempt to realize their ideals. Most important in this regard is the spread of collaborative frameworks on the Web. Moreover, TEI P5 has been aiding in the development of such frameworks. Thus, the movement toward the creation of online lexicons has become steadily more visible.

One basic difficulty seen in the construction of digital lexicons for Buddhist studies is that of making the decisions of selection between such a diversity of resources, methodologies, and potential users, along with the possibilities of improvement of primary texts as the sources of individual entries due to the ongoing discovery of ancient manuscripts. Textual sources are written in a range of languages including Sanskrit, Pali, Tibetan, Chinese, and so on; other types of materials such as pictures, statues, maps, etc., also need to be treated. Methodologies need to be diverse because the field of Buddhist studies includes the approaches of philosophy, literature, history, culture, psychology, and a number of other disciplinary approaches. Going far beyond specialists in Buddhist studies, users of such lexicons also include scholars from many other fields, as well as followers of Buddhist religion, and general users. On the other hand, many primary sources have gradually come to be distributed on the Web so that online lexicons can easily refer to them.

In this presentation, we will focus on the way of treating primary sources in each project. This is important, because there is still considerable debate among these projects regarding optimal approach. This includes a transitional result of the ITLR project and Bauddha Kośa project, the results of the work of the Tibetan Buddhist Research Center. We expect comments from researchers and practitioner of other fields.

Indo-Tibetan Lexical Resource (ITLR): The Underlying Principle, Policy, and Practice of Employing Primary and Secondary Sources

Wangchuk, Dorji, University of Hamburg, Germany, dorji.wangchuk@uni-hamburg.de

The collaborative project ‘Indo-Tibetan Lexical Resource’ (henceforth ITLR) has been initiated with the sole aim of creating a digital lexical resource that will benefit both academics and non-academics who are engaged in the study of Buddhist (i.e. mainly but not exclusively Indic and Tibetan) textual and intellectual cultures. The ITLR database will include Indic words (or phrases), terms, names with their corresponding Tibetan – and occasionally also Chinese and Khotanese – translations, their etymologies and explanations found primarily in Indic and Tibetan sources, metonyms or synonyms, enumerative categories and classifications, modern renderings, related discussions found in modern academic works – all substantiated with primary and secondary sources. Recognizing the advantages of a digital lexical resource over a printed one, the aim of the ITLR from the very outset has been to create a research tool that is continually improvable, extendable, easily accessible, and reliable.

Reliability is a key issue in dealing with or employing sources and resources. One of the causes of frustration in the field of Buddhist Studies seems to be not the lack of relevant lexicographical resources per se but the lack of a comprehensive and reliable up-to-date lexical resource. Of course, while we cannot speak of reliability in absolute terms, maximizing the degree of reliability has been one of the envisioned goals of the ITLR project. Our success will depend not only on the availability of financial, technical, and human resources but also on our competence, cautiousness, and perseverance, not to speak of on how we use primary and secondary sources.

In this presentation, we will discuss the underlying practice, policy, and principle of employing primary and secondary sources for the ITLR project. It will be argued that while the absence or presence of source-references in a lexical work in itself might not necessarily indicate its reliability or unreliability, the lack of verifiable evidences, as in the case of most existing digital Tibetan dictionaries, would often undermine its credibility.

The Bauddha Kośa Project: developing a new digital glossary of Buddhist technical terminology

Koichi Takahashi, University of Tokyo, Japan, snb44191@nifty.ne.jp

Buddhist technical terms are occasionally so profound that it is difficult to translate them into modern languages. Thus, many of scholars have been making efforts to compose specialized dictionaries in the Buddhist terminology. Added to that, today there is movement to develop digital dictionaries which can be browsed on the internet. In this situation, our project attempts to make a new type of digital glossary of the Buddhist technical terms, which is called the Bauddha Kośa, using XML as the data framework.

The basic methodology of our project is, rather than applying definitions to terms by ourselves, to extract statements to explain the meanings of words from the classical texts written in various languages. Then we add the historical rendering for them and the annotations written in Sanskrit, some of which are available only in the Chinese or Tibetan translations today, on the statements quoted in our glossary. At the same time, we translate these sentences into modern languages in order to propose the more appropriate and intelligible translation equivalents. In other words, the new glossary consists of citations from the classical works and their translation equivalents.

As for the policy to digitize this glossary, our project follows TEI P5 (http://www.tei-c.org/Guidelines/P5/, [2011/09/13]). Although TEI P5 provides adequate elements to encode a general dictionary or glossary in modern Western languages, our project occasionally faced the difficulty to digitize our glossary by using the Guidelines of TEI P5 because of its peculiarity that it mainly consists of citations demanding to show the information about sources. (This issue was reported at the poster session of OSDH 2011 on September 13, 2011.)

In this way, some issues to solve remain before accomplishing our purpose, but we are preparing the model of the new digital glossary depending on the Abhidharmakośabhāsya (ed. by P. Prahan 1967), one of the most important glossaries of the Buddhist technical terms composed by Vasubandhu in 5c. This text has two Chinese translations and one Tibetan, and some historical commentaries. In this presentation, I will argue on a few issues to develop the scheme for our glossary Bauddha Kośa by using TEI P5 from the philological viewpoint.

A Dynamic Buddhist Lexical Resource based on Full Text Querying and Tibetan Subject Taxonomies

Wallman, Jeff, Tibetan Buddhist Resource Center, New York, USA, jeffwallman@tbrc.org

To accurately define terms in a lexical resource one must be able to identify the context of those terms in literature. As inquiry into the Tibetan Buddhist lexicon progresses, the Tibetan Buddhist Resource Center (TBRC) offers a framework to evaluate lexical terms (technical terms, concepts, subjects, keywords) in a wide variety of contexts across a massive corpus of source Tibetan texts. Through its preservation and cataloging process, TBRC has developed a method to classify individual works within larger collections according to indigenous Tibetan subject classifications. These indigenous subjects are then organized according to a broader, more generalized framework of knowledge-based taxonomies. Each taxonomy is structured around a series of ‘heap spaces’ – groupings of similar topics, rather than hierarchical structures.

The source literature corpus includes a burgeoning TEI-compliant eText repository as well as scanned source xylographs, manuscripts and modern reprints, spanning the range of the Tibetan literary heritage. The corpus is the largest online repository of Tibetan materials in the world. The metadata framework and text corpus is the basis of an integrated library resource being developed at TBRC. Typing in a technical term in the library, a researcher can see the entry in the database of topics, the location of each topic in a taxonomy, and the relevant associated works. Extending from this controlled entry point, the researcher can then discover and markup terms in pages, by issuing full-text queries across the eText repository.

References

Muller, A. Ch. (2011). The Digital Dictionary of Buddhism: A Collaborative XMLBased Reference Work that has become a Field Standard: Technology and Sustainable Management Strategies. Digital Humanities 2011 Conference Abstracts. June 2011, pp. 189-190.

Nagasaki, K., et al. (2011). Collaboration in the Humanities – Through the Case of Development of the ITLR Project –. IPSJ Symposium Series Vol. 2011, No. 8: The Computers and the Humanities Symposium. Dec 2011, pp. 155-160.