Chapter 5: Optimization

There are certain ways that K-Means can be optimized. One is to ensure that the local minimum that is found as is close to the global minimum, and the other is to find the optimal number of clusters.

Choosing K

In some applications, K might be known in advance, and therefore, this section can be skipped. But in many applications, it is not quite clear what number of K is best. If we think about it, we see that increasing K will almost always reduce the inertia. Why? Because if there are more centroids, then each point is likely to be closer to one.

Therefore, we can’t simply loop through values of K and see which one has the lowest inertia, as this would almost always yield a value close to the number of data points, which is not quite helpful. One technique to solve this is called the Elbow method. In short, the Elbow method tries out more and more clusters, until the inertia’s decrease starts to slow down.

On the plot, we can see that the elbow seems to be located at \(K=3\).

Multiple runs

Because the outcome of K-Means is dependent on the initial centroids, it is common to run the algorithm multiple times. Code-wise, it simply means adding a loop around the fit method. For interface reasons, the method will be renamed _fit.

    def fit(self, X, y=None):

        self.inertia_ = float('inf')  # initialize as worst possible inertia

        for run in range(self.n_runs):

            for i, (c, l) in enumerate(self._fit(X, y)):
                centroids, labels = c, l

                if i > self.max_iters:

            inertia = self._inertia(X, centroids, labels)

            if inertia < self.inertia_:

                self.inertia_ = inertia
                self.centroids_ = centroids
                self.labels_ = labels

        return self

Better Initialization

Running the algorithm multiple times will automatically increase the chances of finding a good local optimum. On top of this, people have come up with different ways to initialize the centroids which increase both the speed and the quality of the local optimum that is found.

One famous way is the kmeans++ algorithm. It works as follows:

  • Pick a random centroid
  • Compute the distance \(d_i\) of each point \(i\) to that centroid
  • Select the next centroid randomly, such that each point has probability \(d^2\) to be chosen.
  • Continue until the desired number of centroids is picked.

Intuitively, kmeans++ ensures that the centroids are well spread out, which improves the performance of K-Means in practice.

    def _initialize_centroids(self, X):

        centroids = []

        weights = np.ones(len(X))
        weights /= weights.sum()

        for k in range(self.k):

            centroid = X[np.random.choice(np.arange(len(X)), 1, p=weights)[0], :]


            distances = np.array([self._distance(centroid, x) for x in X])

            weights = distances**2
            weights /= weights.sum()

        return centroids