Chapter 4: K-Medoids

In order to use any distance measure and still ensure convergence, one can use K-Medoids. One can view K-Medoids as the most general version of K-Means.

K-Medoids will converge regardless of the distance measure used, so in this example, we’ll use cosine distance.

import numpy as np
import copy

from unsupervised.kmeans import KMeans
from utils.distances import pdist, euclidean, manhattan, cosine


class KMedoids(KMeans):

    def _distance(self, a, b):

        return cosine(a, b)

Like K-Medians, K-Medoids requires a slightly different way to define the centroids. K-Medoids defines a centroid, in this case a medoid, as the most central point in a cluster. In practice, it means that every data point could potentially be a centroid, but the one minimizing the inertia the most is picked at each step.

    def _compute_centroids(self, X, labels):

        if not hasattr(self, "distances"):
            self.distances = pdist(X, self._distance)

        centroids = []

        for i in range(self.k):

            distances = self.distances[np.ix_(labels == i, labels == i)]
            within_cluster_sum_of_distances = np.sum(distances, axis=0)

            centroid = X[labels == i][np.argmin(within_cluster_sum_of_distances)]

            centroids.append(centroid)

        return np.array(centroids)

And like K-Medians, we don’t need to define the rest of the logic, as it is the exact same being used by K-Means!