Publicaciones

Advanced search

Abstract

Emilio Granell, Verónica Romero, Carlos D. Martínez-Hinarejos. Study of the influence of lexicon and language restrictions on computer assisted transcription of historical manuscripts. Neurocomputing, 2020. Vol. 390 pp. 12-27.

State-of-the-art Handwritten Text Recognition (HTR) systems allow transcribers to speed-up the transcription of handwritten text images. These systems provide transcribers an initial draft transcription that can be corrected with less effort than transcribing the handwritten text images from scratch. Currently, even the draft transcriptions offered by the most advanced HTR systems contain errors. Therefore, the supervision of this draft by a human transcriber is still necessary to obtain the correct transcription of the handwritten text images. This supervision can be eased by using interactive and assistive transcription systems, where the transcriber and the automatic system cooperate in the amending process. In this paper, the draft transcription is provided by an HTR system based on Convolutional and Recurrent Neural Networks with Bidirectional Long-Short Term Memory units, and the assistive system is fed by lattices generated by using Weighted Finite State Transducers. The influence of the lexicon and language restrictions on the performance of our computer assisted transcription system is evaluated on three historical manuscripts. The transcriptions offered by the proposed HTR system present very low error rates for the studied historical manuscripts. However, our assistive transcription system without lexicon or language restrictions is able to provide an additional reduction on the human effort required to correct the transcriptions in more than 50% over the transcriptions offered by the HTR system.