# Chapter 4: K-Medoids

In order to use any distance measure and still ensure convergence, one can use K-Medoids. One can view K-Medoids as the most general version of K-Means.

K-Medoids will converge regardless of the distance measure used, so in this example, we’ll use cosine distance.

import numpy as np import copy from unsupervised.kmeans import KMeans from utils.distances import pdist, euclidean, manhattan, cosine class KMedoids(KMeans): def _distance(self, a, b): return cosine(a, b)

Like K-Medians, K-Medoids requires a slightly different way to define the centroids. K-Medoids defines a centroid, in this case a medoid, as the *most central* point in a cluster. In practice, it means that every data point could potentially be a centroid, but the one minimizing the inertia the most is picked at each step.

def _compute_centroids(self, X, labels): if not hasattr(self, "distances"): self.distances = pdist(X, self._distance) centroids = [] for i in range(self.k): distances = self.distances[np.ix_(labels == i, labels == i)] within_cluster_sum_of_distances = np.sum(distances, axis=0) centroid = X[labels == i][np.argmin(within_cluster_sum_of_distances)] centroids.append(centroid) return np.array(centroids)

And like K-Medians, we don’t need to define the rest of the logic, as it is the exact same being used by K-Means!