Chapter 7: Hyperparameter Tuning

Training a machine learning model means that we are trying to find the best state. Many learning algorithms are parameterized. For instance, certain clustering algorithms require one to input the number of desired clusters. Hyperparameter tuning is the process of finding the best parameters for the learning algorithm itself. For clustering, it could mean finding the best number of clusters.

It is almost never possible to try out all possible hyperparameters, thankfully, there exist many different strategies to tackle this.

Grid Search

Grid Search is a technique that lays out possible values for the hyperparameters and triesĀ all combinations of them. It is called Grid Search because it could be viewed as putting all combinations of hyperparameters in a grid, and filling out the grid with the performance that each combination achieves, ultimately picking the best one.

Random Search

Random Search is an alternative to Grid Search. It has the advantage that it allows one to provide a range of possible values for hyperparameters. This means that one could in theory run the search for as long as desired. The randomness often works well as it reduces the bias induced by Grid Search and allows for moreĀ creative solutions.

Bayesian Optimization

Bayesian Optimization is a technique that tries to learn a mapping between the hyperparameters and the performance. At each step, the estimated mapping provides insights on which parameters are likely to perform well. The parameters are then evaluated which subsequently improves the mapping. This technique is a bit more expensive as it involves learning an additional mapping but can prove to be very powerful, even when the models are noisy.