Advanced search


Alejandro H. Toselli, Verónica Romero, Moisés Pastor-i-Gadea, Enrique Vidal. Multimodal Interactive Transcription of Text Images. Pattern Recognition, 2010. Vol. 43 (5), pp. 1814-1825.

To date, automatic handwriting recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. This post-editing process is both inefficient and uncomfortable to the user. An example is the transcription of historic documents: State-of-the-art handwritten text recognition technology is not suitable to perform this task automatically and expensive paleography expert work is needed to achieve correct transcriptions. As an alternative to fully manual transcription and post-editing, a multimodal interactive approach is proposed here where user feedback is provided by means of touch-screen pen strokes and/or more traditional keyboard and mouse operation. User feedback directly allows to improve system accuracy, while multimodality increases system ergonomy and user acceptability. Multimodal interaction is approached in such a way that both the main and the feedback data streams help each-other to optimize overall performance and usability. Empirical tests on three cursive handwritten tasks suggest that, using this approach, considerable amounts of user effort can be saved with respect to both pure manual work and non-interactive, post-editing processing.