Publications

Advanced search

Abstract

Elsa Cubel, Jorge Civera, Enrique Vidal. On the use of grammatical inference techniques for bilingual text classification. Workshop on Grammatical Inference Applications: Successes and Future Challenges, 2005. pp. 46-50.

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorisation of bilingual text is as a useful tool, that can be also applied to the machine translation field. In the present work, three grammatical inference algorithms are used to tackle this bilingual classification task. In addition, smooth $n$-gram language models are introduced as an alternative naive modelisation of bilingual information. To evaluate the performance of stochastic finite-state transducers as bilingual classifiers, several experiments on two categorised bilingual corpora of different complexity were undertaken. The first experiments in a limited-domain corpus show that finite-state transducers obtain similar results to $n$-gram statistical models. On the contrary, the assessment of these inference algorithms in a real-world task reflects their fragility when dealing with insufficient amount of data.