Publications

Advanced search

Abstract

Juan-Carlos Amengual, Alberto Sanchis, Enrique Vidal, José-Miguel Benedí. Language Simplification through Error-Correcting and Grammatical Inference Techniques. Machine Learning, 2001. Vol. 44 pp. 143-159.

In many language processing tasks, most of the sentences generally convey rather simple meanings. Moreover, these tasks have a limited semantic domain that can be properly covered with a simple lexicon and a restricted syntax. Nevertheless, casual users are by no means expected to comply with any kind of formal syntactic restrictions due to the inherent ``spontaneous'' nature of human language. In this work, the use of error-correcting-based learning techniques is proposed to cope with the complex syntactic variability which is generally exhibited by natural language. In our approach, a complex task is modeled in terms of a basic finite state model, F, and a stochastic error model, E. F should account for the basic (syntactic) structures underlying this task, which would convey the meaning. E should account for general vocabulary variations, word disappearance, superfluous words, and so on. Each ``natural'' user sentence is thus considered as a corrupted version (according to E) of some ``simple'' sentence of L(F). Adequate bootstrapping procedures are presented that incrementally improve the ``structure'' of F while estimating the probabilities for the operations of E. These techniques have been applied to a practical task of moderately high syntactic variability, and the results which show the potential of the proposed approach are presented.