Browse by topic
Type of publication
Different approaches to bilingual text classification based on grammatical inference techniques. 2nd Iberian Conference on Pattern Recognition and Image Analysis, 2005. pp. 630-637. Springer-Verlag.Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.