Computer-supported retrieval of manuscripts based on the visual features of a query image is a highly desirable, but rarely available service for manuscript research. The service could be used, for example, to check whether a manuscript, specified by a copy, is contained in a museum collection. This kind of retrieval is often approximated by making use of an index based on textual annotations, and thus requires extensive manual preparation. Retrieval based on a query image without annotations, on the other hand, promises to be mainly automatic and also support interesting applications beyond document retrieval. Most importantly, this service can allow retrieval of manuscripts containing the query image as a detail. For example, one could find manuscripts where characters are written in a specific way, exemplified in the query image. Moreover, one could search for the occurrence of writing patterns consisting of arbitrary graphical features, in short graphs. Similar graphs, retrieved from different manuscripts, may contribute valuable information about a possible scribe identity or a common origin of manuscripts.
In this contribution we describe a novel approach for graph retrieval based on local descriptors at ‘interest points’. Interest points (IPs) specify locations of strong ‘cornerness’ of the image intensities and thus provide reasonably stable reference points for local descriptions. They have proved their worth in many image analysis applications, in particular in image retrieval solutions based on SIFT features (Lowe 2004). Different from SIFT features, our descriptors are not scale and rotation invariant, although tolerant to small variations, giving rise to a distinctly superior performance and efficiency while preserving its usefulness for many concerns of manuscript research. Basically, each descriptor consists of the structure tensors (Koethe 2003) in the neighborhood of an IP, thus giving a precise account of the local gradient distribution. Depending on the resolution of the manuscripts and the chosen size of the neighborhood, the descriptor of a single IP may comprise several hundred feature values, called IP features in the sequel. For highly detailed query images, for example depicting a Chinese character, there may be a large number of IPs (in our experiments up to 40 ), each of which is recorded with its location and its feature values in a local descriptor.
For retrieval, a target image is processed essentially in the same way as the query image, i.e. IPs and IP features are determined. This is done only once and off-line, comparable to establishing an annotation. The main retrieval task is then to find a subset of IPs in a target image whose relative locations and feature vectors best match the query descriptors. We will present an approach which profits from prior segmentation of the target image into possible matching candidates, but can also be applied to large datasets without segmentations. It is controlled by a simple probabilistic model for the kind of differences between query and data which should be tolerated for a retrieval, with parameters which can be adjusted by a manuscript researcher.
First experimental results with Chinese characters in historical manuscripts indicate that our descriptor is tolerant with respect to a certain amount of variations of the same character, yet quite discriminative with respect to structurally different characters, promising high precision and recall.
In the remainder of this abstract, we describe related work in Section 2, give details about the technical implementation in Section 3, and finally report about experimental results in Section 4.
Retrieving graphs from manuscripts is a special case of Content-Based Image Retrieval (CBIR), a well-established research field with a rich set of methods (Datta et al. 2008). But CBIR applied to manuscripts has found very little attention in this community. Most relevant work builds on handwriting recognition which is increasingly applied to historical manuscripts (Fischer et al. 2009; Yosef et al. 2007; Lavrenko et al. 2004; Shrihari et al. 2005). Most approaches so far use retrieval based on words or characters (Lavrenko et al. 2004; Adamek et al. 2007; Srihari et al. 2005; Zhang et al. 2004) which limits the applicability to handwritings with clear word or character separation. A more general approach, as followed in this contribution, relies on retrieving a spatial configuration of graphical features, extracted from the query (Su et al. 2009). For all approaches, the key question is which features to use for the comparison of query and data. It has been shown that the spatial distribution of gradients gives the best results (Zhang et al. 2004; Ding et al. 2007). In most approaches, descriptors based on simple gradient computations (Srihari et al. 2005; Ding et al. 2007) are assigned to a fixed mesh covering a segment. In our work, descriptors are based on structure tensors (Koethe 2003) at IPs determined for each query independent of a segmentation. Furthermore, we use a novel probabilistic model applicable to IPs.
Since our focus is on the retrieval of arbitrary writing patterns, we do not discuss work on OCR here, although OCR could also be an application area worth exploring with our approach.
Due to the restricted length of this abstract, we cannot provide many technical details here. The IPs in our approach are computed using the Harris Corner Detector (Harris and Stephens, 1988) with the improvements introduced in (Koethe, 2003). Fig. 1 shows typical results achieved for manuscripts with Chinese and Arabic handwritings.
Local descriptors are computed for each IP by combining the structure tensors of all points in the neighbourhood (in our examples of size 11 x 11) into a large feature vector characterizing strength and directionality of intensity gradients.
To retrieve matching patterns from target data, IPs of the query are incrementally compared with IPs of the data using a probabilistic model comprising the following boolean probabilities, both for the hypothesis H1 that the target is a match, and the hypothesis H0 that it is not a match:
PIP target descriptor missing in window at query location
Ploc target descriptor dislocated from query location
Pfeat target descriptor features differing from query descriptor features
Let A be pairs of descriptors of query and target, P(H0) and P(H1) the prior probabilities of the respective hypotheses, then for a hypothesis test we have to evaluate:
The comparison is formulated as two hypothesis tests, the first whether the query pattern is contained in the target, and the second whether the target pattern, constrained by a successful first test, is contained in the query. The target is considered a match of the query, if both tests succeed.
We have carried out first retrieval experiments with Chinese and Arabic manuscripts. Fig. 2 left shows a section of the Fo shuo Tiwei jing (British Library Or.8210/S. 2051). The left-most white box marks a character used as a query, the other boxes show matching characters found by the retrieval system. There have been no false negatives, but the second hit in Column 6 is a false positive, although quite similar to the query. Similar results have been achieved for other queries, applied to a manuscript section with 364 characters.
Figure 2a (top) shows part of Vollers0015,S.2 from the Refaiya Library in Leipzig. The section marked with a box (shown also in Fig. 1b) was used as a query for the same manuscript and retrieved as the only hit as to be expected. Further experiments are on the way.
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) in the Research Group ‘Manuscript Cultures in Asia and Africa’ and the Collaborative Research Center for the Study of Manuscript Cultures SFB 950.
Adamek, T., N. E. O’Connor, N. Murphy, and A. F. Smeaton (2007). Word Matching Using Single Closed Contours for Indexing Handwritten Historical Documents. International Journal on Document Analysis and Recognition 9(2-4): 153-165.
Datta, R., D. Joshi, J. Li, and J. Z. Wang (2008). Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys 40( 2): 1-60.
Ding, K., Z. Liu, L. Jin, and X. Zhu (2007). A Comparative Study of Gabor Feature and Gradient Feature for Handwritten Chinese Character Recognition. Proceedings International Conference on Wavelet Analysis and Pattern Recognition, pp. 1182-1186.
Fischer, A., M. Wüthrich, M. Liwicki, V. Frinken, H. Bunke, G. Viehhauser, and M. Stolz (2009). Automatic Transcription of Handwritten Medieval Documents. Proceedings 15th International Conference on Virtual Systems and Multimedia, pp. 137-142.
Harris, C., and M. Stephens (1988). A Combined Corner and Edge Detector. Proceedings 4th Alvey Vision Conference, pp. 147–151.
Koethe, U. (2003). Edge and Junction Detection with an Improved Structure Tensor. Proceedings 25th DAGM Symposium, pp. 25-32.
Lavrenko, V., T. Rath, and R. Manmatha (2004). Holistic Word Recognition for Handwritten Historical Documents. Proceedings Document Image Analysis for Libraries (DIAL), pp. 278-287.
Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision 60(2): 91-110.
Srihari, S. N., C. Huang, and H. Srinivasan (2005). A Search Engine for Handwritten Documents. Proceedings SPIE-IS&T Electronic Imaging, pp. 66-75.
Su, T.-S., T.-W. Zhang, D. J. Guan, and H. J. Huang (2009). Off-line Recognition of Realistic Chinese Handwriting Using Segmentation-free Strategy. Pattern Recognition 42(1): 167-182.
Yosef, I. B., I. Beckman, K. Kedem, and I. Dinstein (2007). Binarization, Character Extraction, and Writer Identification of Historical Hebrew Calligraphy Documents. International Journal on Document Analysis and Recognition (IJDAR) 9: 89-99.
Zhang, B., S. N. Srihari, and C. Huang (2004). Word Image Retrieval Using Binary Features. Proceedings Document Recognition and Retrieval XI, pp. 45-53.