Advanced search


David Pinto, José-Miguel Benedí, Paolo Rosso. Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. Computational Linguistics and Intelligent Text Processing, 2007. Alexander Gelbukh (Editors). pp. 611-622. Springer.

Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. Although this measure is commonly used to calculate a distance between two probability distributions, we have adapted it in order to obtain a distance value between two documents. We have carried out experiments over two different narrow-domain corpora and our findings indicates that it is possible to use this measure for the addressed problem obtaining comparable results than those which use the Jaccard similarity measure.