Publications

Advanced search

Abstract

Moisés Pastor-i-Gadea. Text Baseline Detection, a single page trained system. Pattern Recognition, 2019. Vol. 94 pp. 149-161.

Nowadays, there are a lot of page images available and the scanning process is quite well resolved and can be done industrially. On the other hand, HTR systems can only deal with single text line images. Segmenting pages into single text line images is a very expensive process which has traditionally been done manually. This is a bottleneck which is holding back any massive industrial document processing. A baseline detection method will be presented here1. The initial problem is reformulated as a clustering problem over a set of interest points. Its design aim is to be fast and to resist the noise artifacts that usually appear in historical manuscripts: variable interline spacing, the overlapping and touching of words in adjacent lines, humidity spots, etc. Results show that this system can be used to massively detect where the text lines are in pages. Highlight: This system reached second place in the Icdar 2017 Competition on Baseline Detection (see Table 1).