Publications

Advanced search

Abstract

Alejandro H. Toselli, Verónica Romero, Enrique Vidal. Alignment between Text Images and their Transcripts for Handwritten Documents.. Language Technology for Cultural Heritage. Springer,. 2011. Theory and Applications of Natural Language Processing, pp. 23-37. Caroline Sporleder, Antal van den Bosch y Kalliopi Zervanou (Eds.)

An alignment method based on the Viterbi algorithm is proposed to find mappings between word images of a given handwritten document and their respec- tive (ASCII) words on its transcription. The approach takes advantage of the un- derlying segmentation made by Viterbi decoding in handwritten text recognition based on Hidden Markov Models (HMMs). Two levels of alignments are consid- ered: the traditional one at word level and the one at text-line level where pages are transcribed without line break synchronization. According to various metrics used to measure the quality of the alignments, satisfactory results are obtained. Further- more, the presented alignment approach is tested on two HMMs modelling schemes: one using 78 HMMs (one HMM per character class) and other using two HMMs (for blank space and no-blank characters respectively).