Publications

Advanced search

Abstract

Alejandro H. Toselli, Enrique Vidal, Verónica Romero, Volkmar Frinken. HMM Word Graph Based Keyword Spotting in Handwritten Document Images. Information Sciences, 2016. Vol. 370-371 pp. 497-518. Information Sciences 370-371 (2016) 497-518

Line-level keyword spotting (KWS) is approached here on the base of frame-level word pos- terior probabilities. These posteriors are obtained by means of word-graphs derived from the recognition process of a full-fledged handwritten text recognizer based on hidden Markov mod- els and N -gram language models. This approach has several advantages: a) since it uses holistic, segmentation-free technology, it does not require any kind of word or character segmentation; b) the use of language models permits to easily take advantage of the context of each spotted word, thereby considerably increasing the KWS accuracy; and c) the proposed KWS scores are based on true posterior probabilities, computed taking into account all (or most) possible word segmentations of the input image; since these scores are properly bounded and normalized, they lead to smooth threshold-based search which, in real use, allows to achieve comfortable tradeoffs between search precision and recall. Experiments are carried out with several historic collections of handwritten text images, as well as with a well known dataset of modern English handwritten text. According to the empirical results, the proposed approach achieves KWS results compa- rable to those obtained with the recently introduced "BLSTM neural networks KWS" approach and clearly outperform those of one of the most popular, state-of-the-art KWS methods, known as "Filler HMM". Overall, the results clearly support all the above claimed advantages of the proposed approach.