Alejandro H. Toselli, Luis A. Leiva, Isabel Bordes-Cabrera, Celio Hernández-Tornero, Vicente Bosch, Enrique Vidal. Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription. Digital Scholarship in the Humanities, 2017.

We present a process for cost-effective transcription of cursive handwritten text images that has been tested on a 1000 pages 17th century book about botanical species. The process comprised two main tasks, namely: (1) preprocessing: page layout analysis, text line detection, and extraction; and (2) transcription of the extracted text line images. Both tasks were carried out with semiautomatic procedures, aimed at incrementally minimizing user correction effort, by means of computer-assisted line detection and interactive handwritten text recognition technologies. The contribution derived from this work is three-fold. First, we provide a detailed human-supervised transcription of a relatively large historical handwritten book, ready to be searchable, indexable, and accessible to cultural heritage scholars as well as the general public. Second, we have conducted the first longitudinal study to date on interactive handwriting text recognition, for which we provide a very comprehensive user assessment of the real-world performance of the technologies involved in this work. Third, as a result of this process, we have produced a detailed transcription and document layout information (i.e., high-quality labeled data) ready to be used by researchers working on automated technologies for document analysis and recognition.