Chapter 1: Introduction

Linear Regression is one of the most famous and widely used machine learning algorithms out there. It assumes that the target variable can be explained as a linear combination of the input features. What does this mean? It means that the target can be viewed as a weighted sum of each feature. Let’s use a practical example to illustrate that.

Let’s say that we are opening a restaurant, we make great food but we want to know how much to charge for it. We can be very pragmatic and say that the cost of the meal is directly related to what is in it. We can, for instance, have a rule that each ingredient costs a certain amount, and based on how much there is of each ingredient in the dish, we can calculate its price. There may also be a fixed minimum price for each dish. Mathematically, this is called the intercept.

fixed_price = 5
ingredient_costs = {"meat": 10,
                    "fish": 13,
                    "vegetables": 2,
                    "fries": 3}


def price(**ingredients):
    """ returns the price of a dish """

    cost = 0

    for name, quantity in ingredients.items():
        cost += ingredient_costs[name] * quantity

    return cost

Linear Regression makes the assumption that the target, in this case, the price, can be explained like this. The model will know about the quantity of each ingredient, but it will have to infer what the fixed price is, and what is the cost of each ingredient.

It is important to remember that cost, in this situation, is rather abstract. It represents how much each feature affect the outcome, and in which way. Therefore, features can have negative costs for instance.

In the univariate case, where there is only one feature, Linear Regression can be thought of as trying to fit a line through points.

Now, Linear Regression is one of the most popular algorithms because it can do much more than fit straight lines through data. Indeed, with a simple trick, we can make it fit polynomial functions, making it much more powerful.

The trick is to “replace” the original features with a polynomial of a higher degree. In the univariate case, this comes down to not only using the feature itself but also its squared value, cubed value, and so on. For instance, instead of using a single feature X = [2] , we end up with features X = [2,4,8,16,32] and so on. More features mean that the model is explained by more weights, and these weights can express higher-dimensional functions.

A Linear Regression model’s goal is to find the coefficients, also called weights, which will fit the data best. In order to define what best means, we need to define a loss function. This loss function, as we will see later, can be tweaked to alter how the weights are learned. We will also see that finding the best weights in order to minimize the loss function can be done in different ways.