Publications

Advanced search

Abstract

Verónica Romero, Joan-Andreu Sánchez, Nicolás Serrano, Enrique Vidal. Evaluating a post-editing approach for handwriting transcription. Proocedings of the Conference on Natural Language Processing ( Language Technology for Historical Text Workshop), 2012. pp. 357-364.

Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. These books, that were handwritten until the beginning of the 20th century, have interesting information, useful for demography studies and genealogical research. This information is usually collected by expert demographers that devote a lot of time to manually transcribe them. As the accuracy of automatic handwritten text recognizers improves, post-editing the output of these recognizers could be foreseen as a possible alternative. Unluckily, most handwriting recognition techniques require large amounts of annotated images to train the recognition engine. In this paper we carry out a study about how the handwritten recognition system accuracy improves with respect to the amount of training data, and how the human efficiency increases during the transcription of a marriage license book.