Alfons Juan, Enrique Vidal. On the use of Bernoulli mixture models for text classification. Proc. of the Workshop on Pattern Recognition in Information Systems (PRIS 01), 2001.

Mixture modelling of class-conditional densities is a standard pattern recognition technique. Although most research on mixture models has concentrated on mixtures for continuous data, emerging pattern recognition applications demand extending research efforts to other data types. This paper focuses on the application of mixtures of multivariate Bernoulli distributions to binary data. More concretely, a text classification task aimed at improving language modelling for machine translation is considered.