Chapter 2: Naïve Bayes Classification

Here, we will show how to implement a Naive Bayes classifier from scratch. We first start with a predict function. This function will compute the joint probability for each class as defined in _predict_joint_proba, and return the class with the highest joint probability.

As described above, the joint probability is defined as \(Prior \times Likelihood\).

In order to compute the evidence and likelihood, however, one must assume the probability distribution that the features follow. And there are plenty of them available. This is why we often refer to Naïve Bayes classifiers, plural, one for each distribution.

import numpy as np


class NBClassifier(object):

    def predict(self, X, y=None):

        joint_probas = self._predict_joint_proba(X)
        indices = np.argmax(joint_probas, axis=1)

        return self.classes_[indices]

    def _predict_joint_proba(self, X, y=None):

        return np.array([[self._get_prior(c) * self._get_likelihood(sample, c) for c in self.classes_]
                         for sample in X])

In order to calculate actual probability estimates, the joint probabilities have to be divided by the evidence.

    def predict_proba(self, X, y=None):

        joint_probas = self._predict_joint_proba(X, y)
        evidence = np.array([[self._get_evidence(x)] for x in X])

        return joint_probas / evidence

Now, this is only possible if the values for these probabilities are known. This is why, we need to compute the prior, likelihood, and evidence for each class given the available data. Naïve Bayes classifiers are typically suited for online learning as their distributions can easily be updated.

    def fit(self, X, y):

        self.class_counts_ = self._fit_prior(y)
        self.likelihood_ = self._fit_likelihood(X, y)
        self.evidence_ = self._fit_evidence(X)

        return self

    def update(self, X, y):

        self.class_counts_ = self._update_priors(y)
        self.likelihood_ = self._update_likelihood(X, y)
        self.evidence_ = self._update_evidence(X)

        return self

For most distributions, computing the prior consists in simply keeping track of class frequencies.

    def _fit_prior(self, y):

        self.classes_, self.class_counts_ = np.unique(y, return_counts=True)

        return self.class_counts_

    def _get_prior(self, c):

        return self.class_counts_ / np.sum(self.class_counts_)

    def _update_priors(self, y):

        self.classes_, counts = np.unique(y, return_counts=True)
        self.class_counts_ += counts

        return self.class_counts_