Chapter 3: Evaluation Techniques

When we train machine learning models, we need a way to estimate how well they will perform. One way to do that is to split the data at our disposal into a train and a test set. This way, the models are trained on some data, and we can then see how well they will perform in situations they never encountered before.

An extremely important thing to understand is that splitting the data into a train and a test set is only done in order to estimate the performance of models. Once we have estimated the performance, models should always be trained with as much data as possible.

(Stratified) K-Fold Cross Validation


We want to estimate how well our model trained, when trained with the entire data, will perform. Therefore, for the evaluation, we need to use a model that is as similar as possible to that model. This is why the training set is typically bigger than the testing set. On the other hand, the model must be tested on a large enough test set. If this is not possible, cross-validation can be used. K-Fold Cross-validation is a method which splits the data in K folds of identical size. Then, repeatedly, one fold is held out and used as a test set, on which performance is computed. After each fold has been evaluated, the performance is averaged.

A Cross-Validation is stratified when it is ensured that each fold holds similar statistics as the full dataset. It has also been empirically found that using 10 folds performs well, however, any value for K can be used. When there are as many folds as the number of data points, the method is referred to as Leave-One-Out.

Data must be representative

First and foremost, for our performance estimate to make any sense, we need to train our model on data that is representative of the data it will encounter later. This is the same as studying for a French exam but then having to take a German exam. It doesn’t matter how good we are at French if we are judged by how well we know German. Since we don’t know what data we will encounter in the future, we need to make assumptions. And it is crucial to keep these assumptions in mind in order to train a model well-suited for a task.