Publications

Advanced search

Abstract

Verónica Romero, Alicia Fornés, Joan-Andreu Sánchez, Enrique Vidal. Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016. pp. 331-336.

Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.