Duration: 1 January 2016 to 31 December 2018
Supported by: under reference TIN2015-70924-C2-1-R

Processing of handwritten documents is a task that is of wide interest for many purposes, such as those related to preserve cultural heritage. Handwritten text recognition techniques have been successfully applied during the last decade to obtain transcriptions of handwritten documents, and keyword spotting techniques have been applied for searching specific terms in image collections of handwritten documents. However, results on transcription and indexing are far from perfect, although models used for handwritten text recognition and keyword spotting have experimented a large evolution and current results have improved largely with respect to initial proposals. In this framework, the use of new data sources arises as a new paradigm that will allow for a better transcription and indexing of handwritten documents. Three main different data sources could be considered: context of the document (style, writer, historical time, topics, …), multimodal data (representations of the document in a different modality, such as the speech signal of the dictation of the text), and user feedback (corrections, amendments, …). The CoMUN-HaT project aims at the integration of this different data sources into the transcription and indexing task for handwritten documents. The project will study the use of context derived from the analysis of the documents in order to improve the recognition results that can be used for transcription and indexing. Apart from that, it will explore how multimodality can aid the recognition process to obtain more accurate transcriptions. Besides, in the case of ancient documents it could be of high interest obtaining an automatic transcription in a modern version of the language or even in a different language. The inclusion of context, multimodality, and modernisation in a user-in-the-loop assisted text transcription framework is the final aim, including features such as document contextualisation, multimodal input, term translation, and user feedback. This will be reflected in the construction of a transcription and indexing platform that can be used by both professional and non-professional users, contributing to crowd-sourcing activities to preserve cultural heritage and to obtain an accessible version of the involved corpus. Finally, the access to the generated data could be used by professionals of a given area or by non-profesional users with different interests.