Advanced search


Emilio Granell, Verónica Romero, Carlos D. Martínez-Hinarejos. Using Speech and Handwriting in an Interactive Approach for Transcribing Historical Documents. Handwriting: Recognition, Development and Analysis, 2017. pp. 277-295. Nova Science Publishers, Inc.

Transcription of historical documents is an interesting task for libraries in order to provide efficient information access to text transcription of digitised historical documents. The manual transcription process is done by professionals called paleographers. In the latest years, the use of handwritten text recognition systems allowed to speed up the manual transcription process. However, state-of-the-art handwritten text recognition systems are far from being perfect, and paleographer revision is required to really produce a transcription of standard quality. Anyway, the initial result of automatic recognition may make easier paleographers task, since they are able to correct on a draft transcription. Moreover, a multimodal interactive assistive scenario, where the automatic system and the paleographer cooperate to generate the perfect transcription, would reduce the time and the paleographer effort required for obtaining the final result. In this context, the assistive transcription system proposes a hypothesis, usually derived from a recognition process. The recognition can be unimodal (e.g., from a handwritten text image or the audio signal with its dictation) or multimodal (two or more signals which represent the same sequence of words). Then, the paleographer reads it and produces a feedback signal (first error correction, positioning, etc.), and the system uses it to provide an alternative hypothesis, starting a new cycle. This process is repeated until a perfect transcription is obtained. In this work we present a multimodal interactive transcription system where user feedback is provided by means of touchscreen pen strokes, traditional keyboard, and mouse operations. The combination of the main and the feedback data streams is based on the use of Confusion Networks derived from the output of three recognition systems: two handwritten text recognition systems (off-line and on-line), and an automatic speech recognition system. Off-line text recognition and speech recognition are used to derive (by themselves or by combining their recognition results) the initial hypothesis, and on-line text is used to provide feedback. The use of the proposed multimodal interactive assistive system not only reduces the required transcription effort, but it also helps to optimise the overall performance and usability, allowing for a faster and more comfortable transcription process.