Browse by topic
Type of publication
Handwriting recognition in historical documents using very large vocabularies. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing (HIP '13), 2013. pp. 67-72. ACM.Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocab- ulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the perfor- mance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words.