That is, letting $\varphi : \mathbb R^p \to \mathcal H$ be some feature map such that the desired metric $d$ can be written $d(x, y) = \lVert \varphi(x) - \varphi(y) \rVert_$. This refers to the idea of implicitly mapping the inputs to a high-, or infinite-, dimensional Hilbert space, where distances correspond to the distance function we want to use, and run the algorithm there. One natural extension of k-means to use distance metrics other than the standard Euclidean distance on $\mathbb R^d$ is to use the kernel trick. Since this is apparently now a canonical question, and it hasn't been mentioned here yet: It is much more expensive than the mean, though. The medoid minimizes arbitrary distances (because it is defined as the minimum), and there only exist a finite number of possible medoids, too. If you want arbitrary distance functions, have a look at k-medoids (aka: PAM, partitioning around medoids). If you are looking for an Manhattan-distance variant of k-means, there is k-medians.īecause the median is a known best L1 estimator. To use this proof for other distance functions, you must show that the mean (note: k- means) minimizes your distances, too. Therefore, it must converge after a finite number of improvements. There is a finite number of assignments possible.
#Jadakiss why don't you stack instead of trying to look fly why is rattin update#
The common proof of convergence is like this: the assignment step and the mean update step both optimize the same criterion. Why it is not correct to use arbitary distances: because k-means may stop converging with other distance functions. The basic idea of k-means is to minimize squared errors. answer refers to pairwise Euclidean distances!) Now if you look at the definition of variance, it is identical to the sum of squared Euclidean distances from the center. K-means minimizes within-cluster variance. The way k-means is constructed is not based on distances. See also answer for an interpretation of k-means that actually involves pointwise Euclidean distances. See related question K-means: Why minimizing WCSS is maximizing Distance between clusters?.
It is related to but not quite the same question as whether noneuclidean deviations from centroid (in wide sense, centre or quasicentroid) can be incorporated in K-means or modified "K-means". Please note I was discussing the topic whether euclidean or noneuclidean dissimilarity between data points is compatible with K-means. But it will work slowly, and so the more efficient way is to create data for that distance matrix (converting the distances into scalar products and so on - the pass that is outlined in the previous paragraph) - and then apply standard K-means procedure to that dataset. It is possible to program K-means in a way that it directly calculate on the square matrix of pairwise Euclidean distances, of course. See also about "K-means for distance matrix" implementation. Therefore, it is possible to make K-Means "work with" pairwise cosines or such in fact, such implementations of K-Means clustering exist. If you have cosine, or covariance, or correlation, you can always (1) transform it to (squared) Euclidean distance, and then (2) create data for that matrix of Euclidean distances (by means of Principal Coordinates or other forms of metric Multidimensional Scaling) to (3) input those data to K-Means clustering. For example, it is closely tied with cosine or scalar product between the points. That's why K-Means is for Euclidean distances only.īut a Euclidean distance between two data points can be represented in a number of alternative ways. Non-Euclidean distances will generally not span Euclidean space. Euclidean space is about euclidean distances. It is multivariate mean in euclidean space. The term "centroid" is itself from Euclidean geometry. However, K-Means is implicitly based on pairwise Euclidean distances between data points, because the sum of squared deviations from centroid is equal to the sum of pairwise squared Euclidean distances divided by the number of points. It amounts to repeatedly assigning points to the closest centroid thereby using Euclidean distance from data points to a centroid.
K-Means procedure - which is a vector quantization method often used as a clustering method - does not explicitly use pairwise distances between data points at all (in contrast to hierarchical and some other clusterings which allow for arbitrary proximity measure).