Chapter 1: What is a model?

Understanding the concept of models is very important in machine learning. To understand what a model is, let’s take a step back and think about how humans make decisions. When we are born, though we have some primal instincts, we don’t know much about the world around us. As we grow up and experience more things, we learn, and we start to better understand how the world works. We start to understand language, we become better at sensing moods, predicting the near future, and so on.

More formally, you could say that throughout our lives, we accumulate information, we turn it into knowledge and store it in our brain. We then make all of our subsequent choices based on that knowledge. Machine learning models are no different. They are being fed information, turn that information into knowledge on how the world works and make decisions accordingly. Whilst we humans keep all of our knowledge into our brains, machine learning models store this knowledge into mathematical models. We typically refer to the knowledge that a model has as the state of that model. Intuitively, the state of a model is something that can be saved to a file. In our spam filter example, the model’s state was the list of blocked senders and the blacklisted words for instance.

When deciding to build a model, there are 4 major questions that one must consider.

How complex should the model’s state be?

A big difference between machine learning and humans is that humans have a single model that they use for everything. In machine learning, at least for now, this is not possible. Machine learning models are typically built for very specific tasks. And therefore, depending on the tasks, models may vary quite a lot. For instance, a machine learning model aimed at filtering spam e-mails will likely need to be different than a model whose goal is to drive a car.

The more complex a model’s state is, the more knowledge it will be able to hold, and therefore it will be better suited for complicated tasks. But a more complex state also means that it will be more difficult to learn. For the spam filter example, a more complex state might mean that on top of blocked senders and words, we could also add a list of whitelisted senders or words, but also certain combinations of words instead of single words, etc. 

How will the model learn its state?

As explained above, more complex states can solve more complicated tasks, but they typically are more complicated to learn. In theory, it is sometimes possible to try every possible state for that model, and pick the one that performs best. This is a perfectly valid way to do machine learning, however, it is very rarely feasible. In some cases, there is an infinite number of states to go through.  Most papers in the field are dedicated to finding smart ways to find the best state without needing to try them all.

Once learned, how should the model use its state?


A model learns a state which will then allow it to make decisions. Models make decisions in wildly different ways, some use decision-trees, some use analogies based on previous observations, some use Bayesian methods, etc. Based on the application, some models might be faster at making predictions, some might be better at explaining how they made their decision, or some may need less training data to generalize. Typically, choosing what the state will be and how it will be used is tied together.

How complex should the model’s state be?

All three aspects that we discussed above are highly dependent on the amount and the quality of the data available to train the model. Models need to be trained with good information and enough of it to be performing well at their task. If we don’t have access to a lot of data, we might want to pick a simple model which is better at generalizing, whereas if we have tons of data, we might prefer a more complex model.

This is similar to how humans learn. If we have to learn very complex concepts, we will need to be given quality textbooks, courses, etc. and plenty of it.