Chapter 4: Regression

Regression is one of the most common machine learning task. It is a supervised task where a model maps input to a continuous output. More formally, a regression problem can be defined as learning a function \(f\) that will map input variables \(X=x_0,x_1,\dots,x_{m−1},x_{m}\) to a continuous target variable \(y\) such that \(f(x)=y\). So for instance, let’s say that we have the following data:

Variable 1 Variable 2 Variable 3 Variable 4 Target Variable
1 2 3 4 10
2 3 4 5 14
3 4 5 6 18
2000 2001 2002 2003 8006

We would want to learn some function such that \(f(1,2,3,4)=10\) and \(f(2,3,4,5)=14\) and so on.

Regression is often compared to curve fitting since it is trying to fit some function \(f\) that will follow a similar curve as the data.

Regression can be used for a large array of tasks. Below, we list a few practical examples where regression could come in handy.

Predicting house prices

Input variables: Number of bedrooms, whether it has a garage, living surface, the age of the house
Target variable: Price of the house

Bedrooms Garage Living surface Age Price ($)
3 0 3000 1 245000
2 1 2650 14 312040
4 0 4000 60 180000
5 1 5432 4 800670

This could be useful to make an estimate on a house, either if you’re selling or buying one.

Predicting students’ grades

Input variables: Grade on last test, GPA
Target variable: Grade on final exam

Last test GPA Final Exam
3 5.5 5
10 7 6
7 8 7.5
8 9.2 10

A teacher could use this to identify which students might require additional attention.

Predict how likely it is for a customer to default on a loan

Input variables: Income, age, children, married
Target variable: Likelihood of defaulting

Income Age Children Married Likelihood of defaulting
2500 33 1 1 0
1200 42 3 1 1
0 18 2 0 1
9000 28 0 0 0

A bank could use this to decide whether or not to grant a loan.