Advanced search


Verónica Romero, Joan-Andreu Sánchez. Human Evaluation of the Transcription Process of a Marriage License Book. International Conference on Document Analysis and Recognition, 2013. pp. 1287-1291. IEEE Computer Society Conference Publishing Services (CPS). A

Handwriting Text Recognition (HTR) of historical documents is a very important research field of Document Image Analysis. Currently, the most well-accepted technology for offline HTR is based on holistic, segmentation-free techniques that do not need any kind of character or word segmentation. This HTR technology is based in stochastic models that are trained with annotated data. The performance of this technology is still far from being perfect and therefore the user intervention is necessary to obtain perfect transcripts. The user intervention can be carried out in a post-editing process, in which the user corrects the errors produced by an automatic HTR system. Interactive techniques have been proposed in the past few years to obtain the correct transcript as an alternative to post-editing the transcripts. In these interactive approaches, the user and the system work interactively in tight mutual collaboration to obtain the perfect transcript of the data. In this interactive scenario, the feedback provided by the user is used to improve interactively the system output. In the post-editing scenario and in the interactive scenario, the transcribed material can be used for retraining the models as the data is processed. In this research we carried out a study with a real transcriber about how the performance of an HTR system improved with respect to the amount of training data, and how the human efficiency improved during the transcription process in both transcription scenarios.