Advanced search


Moisés Pastor-i-Gadea. Aportaciones al reconocimiento automático de texto manuscrito. Dep. de Sistemes Informàtics i Computació. 2007. Advisors: E. Vidal and A.H. Tosselli

This thesis is addressed to the subject of robustness on automatic handwritten text recognizers. These systems will be available for a generalized use when them can provide to any user, without any specific training, a reasonable productivity. These systems must not require any effort that a writer do not if he/her are writing for a human being. Then, it is needed to build robust and flexible systems from the input point of view. The signal prepprocess aims to make the system invariant to all those sources that do not help to classify the handwriting text. Nowadays, there are not a stan- dard solution for achieve invariability to the handwritten style. Every system has its own ad-hock solution. This thesis explores several methods for off-line input signal normalization. For that, a prepprocess algorithms spread study is made. The algorithms are classified as page level: threshold, noise reduction and skew an- gle correction; and text level: slope and slant angle correction, and character size normalization. Writer dependent systems achieve consistently better recognition results com- pared to writer independent systems. On the other hand, collecting a large number of data samples for training writer independent systems is easier than the collecting a large data samples from a single writer. In this work we made a study of indepen- dent systems adaptation for be used by a single writer. This way, the writer would writes in a more relaxed way without system productivity loss. Automatic handwritten recognition systems are not extent of errors. On the other hand, it is interesting, not only know the number of errors but know what hypothesis units are errors, or are suspected to be bad classified, in order to correct them manually. At this thesis, a successful speech recognition hypothesis verifica- tion techniques are adapted to be used on automatic manuscript text recognition systems. The automatic continuous speech recognition problem have important simila- rities with the manuscript text automatic recognition one. Because that, the spee- ch recognition engine ATROS (Automatically Trainable Recognizer Of Speech) [PSCV01] will be adapted to be used as manuscript text recognizer. ATROS is a recognizer engine based on free segmentation models, hidden Markov models.