Chapter 4: Evaluating Regression

Machine learning is all about getting better and better at a task. Therefore, we need to define what it means to be good. In the case of regression, it is better to be as close to the target variable. But how do we define close? There exist several performance metrics to evaluate regression.

Mean-Squared Error

The mean-squared error is probably the most commonly used metric for regression. It is often set as the default metric in many machine learning packages. It is defined as the average of the square of the errors. It loosely means that large errors are proportionally worse than small mistakes.

def MSE(predicted_target, target):
    errors = predicted_target - target
    
    return np.mean(errors**2)

 Root Mean-Squared Error

The root mean-squared error is related to the mean squared error. It is simply the square root of the former metric. It has the advantage of being of the same units as the target variable. Therefore, it can be easily interpreted as the average distance of the output to the target.

def RMSE(predicted_target, target):
    return np.sqrt(MSE(predicted_target, target))

Mean Absolute Error

As opposed to the mean-squared error, the mean absolute error views all errors as proportionally as bad as each other and, therefore, large errors are not penalized more.

def MAE(output, target):
    errors = output - target
    
    return np.mean(np.abs(errors))

R-Squared

R-Squared is also often referred to as the coefficient of determination, or the explained variance. It represents how much of the target’s variance can be explained by the data. 1 is best, lower is worse.

def RSquared(predicted_target, target):
    numerator = np.sum((target - predicted_target)**2)
    denominator = np.sum((target - np.mean(target))**2)
    
    return 1.0 - (numerator / denominator)

Custom Metrics

A simple example would be to use weighted versions of the aforementioned metrics. By doing this, you would loosely make it more important to perform well for certain data points than others. It could also be possible to have a fully custom metric based on a custom error function. Perhaps, your application entails that it is much worse to overshoot rather than undershoot for instance.

The metric should ultimately represent what it means for your regression to be good, whatever it may mean in your application.