Publications

Advanced search

Abstract

Jorge Civera, Elsa Cubel, Alfons Juan, Enrique Vidal. Different approaches to bilingual text classification based on grammatical inference techniques. 2nd Iberian Conference on Pattern Recognition and Image Analysis, 2005. pp. 630-637. Springer-Verlag.

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.