Advanced search


Carlos D. Martínez-Hinarejos. La cadena media y sus aplicaciones en el Reconocimiento de Formas. Universidad Politécnica de Valencia. 2003. (In spanish). Advisors: A. Juan and F. Casacuberta

In the Pattern Recognition field, good prototypes are needed for each class when using distance-based classification techniques (and more specifically, k-NN classifiers). One possibility is to use the mean of the class as prototype of the class (or, when a partition in subclasses of the class is available, the set of the means from each subclass). In euclidean spaces (vectorial representation), to compute the mean is a simple problem. However, when data is represented by strings, the problem of computing the so-called median string is NP-Hard. Therefore, approximations to the median string are defined for classification purposes. The classic approximation is the set median string. New approximations are proposed following several methods. The first proposed method is a greedy method whose results do not improve set median results. Afterwards, two approximated methods based on iterative perturbation are proposed. These new approximations achieve better classification results than set median, although their computational cost is higher. Afterwards, some interesting topics are studied. An alternative definition of median string is proposed (this new definition does not show practical differences with the classical definition). Some specific cost reduction techniques are introduced for the iterative perturbation methods (but producing worse prototypes). The exact calculation of the median string is also performed using Branch and Bound. The exact results show that the proposed approximations are very accurate. Then, the proposed approximations are use for clustering methods (to obtain several clusters in each class). This new proposal presents a better behaviour than the classical method (k-medians). The proposed approximations are also used in cyclic strings. The results reveal that the approximations to the median string outperform the set median in classification in cyclic strings. Finally, parametric methods and distance-based methods are compared for classification in string sets. This comparation reveals that non-parametric methods perform better as the number of clusters of each class increases.