Publications

Advanced search

Abstract

Carlos D. Martínez-Hinarejos, Alfons Juan, Francisco Casacuberta. Generalized k-Medians Clustering for Strings. Pattern Recognition and Image Analysis, First Iberian Conference IbPRIA 2003 Proceedings, 2003. Francisco J. Perales, Aurélio J.C. Campilho, Nicolás Pérez de la Blanca, Alberto Sanfeliu (Editors). pp. 502-509. Springer-Verlag.

Clustering methods are used in pattern recognition to obtain natural groups from a data set in the framework of unsupervised learning as well as for obtaining clusters of data from a known class. In sets of strings, the concept of set median string can be extended to the (set) k-medians problem. The solution of the k-medians problem can be viewed as a clustering method, where each cluster is generated by each of the k strings of that solution. A concept which is related to set median string is the (generalized) median string, which is an NP-Hard problem. However, different algorithms have been proposed to find approximations to the (generalized) median string. We propose extending the (generalized) median string problem to k strings, resulting in the generalized k-medians problem, which can also be viewed as a clustering technique. This new technique is applied to a corpus of chromosomes represented by strings and compared to the conventional k-medians technique.