# Chapter 1: Introduction

Mean Shift is a non-parametric algorithm which can be used for clustering. One of the big advantages of Mean Shift is that it can find an arbitrary number of clusters. The intuition behind how it works is the following.

Mean Shift assumes that the data points are generated according to some probability density distribution. Visually, this density distribution would look like a map of some terrain, where points are more likely to be generated near the “peaks”. Mean Shift tries to find clusters by finding these high-density regions. This works by first assuming that there are peaks at every data point. Then, these points iteratively “shift” towards a higher-density region. Once all points have shifted towards a peak, the centers of the clusters have been found.

Mathematically, this notion of high-density is represented by the use of a kernel function. Intuitively, kernels functions represent some form of similarity between data points. By computing all pairwise kernel values for the points in a data set, we can estimate the probability density function, and therefore find local maxima. A famous kernel function is the radial basis function.

These are the steps of the algorithm:

- Assign every data point as a center
- Compute the kernel density between each center and all data points
- Shift the center by calculating the weighted average of all points based on kernel densities
- Repeat until all centers have converged