Browse by topic
Type of publication
Probabilistic Indexing and Search for Information Extraction on Handwritten German Parish Records. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018. pp. 44-49.We endeavor to perform very large scale indexing of an ancient German collection of manuscript parish records. To this end we will compute "probabilistic indexes" (PIs), which are known to allow for very accurate and efficient implementation of (single-)keyword spotting. PIs may become prohibitively large for vast manuscript collections. Therefore we analyze simple index pruning methods to achieve adequate tradeoffs between memory requirements and search performance. We also study how to adequately deal with the large variety of non-ASCII symbols and handwritten word spelling variations (accents, umlauts, etc.) which appear in this kind of historical collections. Finally, and most importantly, since most of the images of the collection we aim to index are handwritten tables, we explore the use of PIs to support structured queries for information extraction from untranscribed handwritten images containing tabular data. Empirical results on a small, but complex and representative dataset extracted from the collection considered confirm the viability and adequateness of the chosen approaches.