# Chapter 4: Regression

Regression is one of the most common machine learning task. It is a supervised task where a model maps input to a continuous output. More formally, a regression problem can be defined as learning a function \(f\) that will map input variables \(X=x_0,x_1,\dots,x_{m−1},x_{m}\) to a continuous target variable \(y\) such that \(f(x)=y\). So for instance, let’s say that we have the following data:

Variable 1 | Variable 2 | Variable 3 | Variable 4 | Target Variable |
---|---|---|---|---|

1 | 2 | 3 | 4 | 10 |

2 | 3 | 4 | 5 | 14 |

3 | 4 | 5 | 6 | 18 |

… | … | … | … | |

2000 | 2001 | 2002 | 2003 | 8006 |

We would want to learn some function such that \(f(1,2,3,4)=10\) and \(f(2,3,4,5)=14\) and so on.

Regression is often compared to curve fitting since it is trying to fit some function \(f\) that will follow a similar curve as the data.

Regression can be used for a large array of tasks. Below, we list a few practical examples where regression could come in handy.

**Predicting house prices**

*Input variables*: Number of bedrooms, whether it has a garage, living surface, the age of the house

*Target variable*: Price of the house

Bedrooms | Garage | Living surface | Age | Price ($) |
---|---|---|---|---|

3 | 0 | 3000 | 1 | 245000 |

2 | 1 | 2650 | 14 | 312040 |

4 | 0 | 4000 | 60 | 180000 |

… | … | … | … | |

5 | 1 | 5432 | 4 | 800670 |

This could be useful to make an estimate on a house, either if you’re selling or buying one.

**Predicting students’ grades**

*Input variables:* Grade on last test, GPA

*Target variable*: Grade on final exam

Last test | GPA | Final Exam |
---|---|---|

3 | 5.5 | 5 |

10 | 7 | 6 |

7 | 8 | 7.5 |

… | … | … |

8 | 9.2 | 10 |

A teacher could use this to identify which students might require additional attention.

**Predict how likely it is for a customer to default on a loan**

*Input variables*: Income, age, children, married

*Target variable*: Likelihood of defaulting

Income | Age | Children | Married | Likelihood of defaulting |
---|---|---|---|---|

2500 | 33 | 1 | 1 | 0 |

1200 | 42 | 3 | 1 | 1 |

0 | 18 | 2 | 0 | 1 |

… | … | … | … | |

9000 | 28 | 0 | 0 | 0 |

A bank could use this to decide whether or not to grant a loan.