Chapter 1: Introduction

K-Means is a famous algorithm used for clustering. It was originally invented by Stuart Lloyd in 1957 but was never published until the 1980s. In the meantime, E. W. Forgy published more or less the same technique, and therefore, K-Means is sometimes referred to as the Lloyd-Forgy method.

It groups data points into K distinct clusters. These clusters are represented by a prototype point called the centroid, which is also the mean of the elements in that cluster.

Looking at the plot above there seems to be 3 distinct groups. In order to identify these clusters, K-Means partitions the data into K clusters such that the within-cluster sum of squares, also called variance, is minimized.

Intuitively, it means that when the data is clustered, each cluster has a mean that is defined as the average of all points in that cluster. Minimizing the within-cluster variance means that the data points that are part of a cluster are as close as possible to the mean of that cluster.

For n data points and K clusters, there exist n^k possible clusterings, therefore, trying out all possible clusterings and picking the best one is often infeasible. Thankfully, Stuart Lloyd came up with an algorithm which converges to a local minimum much faster in practice. The algorithm is an incremental algorithm which reduces the within-cluster variance step by step.

Here is the algorithm:

  1. Initialize K centroids
  2. Assign each point to the cluster with the closest centroid
  3. Update the centroids by averaging the data points assigned to their cluster
  4. Repeat until cluster assignments do not change anymore

It is important to note that the outcome of this algorithm will be dependent on its original centroids. Depending on where these centroids are, the output will be different. This is why it is customary to run K-Means multiple times with different start conditions in order to not be stuck in a poor local minimum.