Chapter 5: Multinomial Naïve Bayes

In certain situations, using Bernoulli is not enough. Indeed, sometimes, knowing how many times a feature appears, as opposed to whether it appears at all is necessary.

It is important to note that in some cases, even though it might seem non-intuitive, Bernoulli outperforms Multinomial. It is likely due to the fact that a Multinomial distribution is only affects by the features that do appear. Whereas Bernoulli also takes the absence of features into account.

from math import factorial as fact
from collections import Counter

import scipy.stats as ss
import numpy as np

from supervised.nb_classifier import NBClassifier


class MultinomialNB(NBClassifier):

    def __init__(self, alpha=1.0):

        super().__init__()
        self.alpha = alpha

    def _pdf(self, x, p):

        f = fact(np.sum(x))

        for P, X in zip(p, x):
            f *= (P**X) / fact(X)

        return f

Fitting

A Multinomial distribution is parameterized by a single parameter, which depends on the number of times each feature occurs.

    def _fit_evidence(self, X):

        evidence_ = np.sum(X, axis=0)

        return evidence_

Fitting the likelihood then becomes trivial, as it is similar to fitting the evidence for each class.

    def _fit_likelihood(self, X, y):

        likelihood_ = []

        for c in self.classes_:

            samples = X[y == c]  # only keep samples of class c

            likelihood_.append(self._fit_evidence(samples))

        return likelihood_

Getting

Assuming that our model is trained, we need to be able to make use of its state in order to compute the evidence and likelihood. We can then reuse the _pdf that was defined at the beginning.

The alpha parameter of the model is there to parameterize the additive smoothing. Additive, or Laplace, smoothing helps maintaining non-zero probabilities. No smoothing means keeping the original probability estimate, and maximum smoothing means assuming all probabilities are uniform.

    def _get_evidence(self, sample):

        p = []

        for i, feature in enumerate(sample):

            x = self.evidence_[i]
            N = np.sum(self.evidence_)
            d = len(sample)
            a = self.alpha

            prob = (x + a) / (N + (a * d))

            p.append(prob)

        return self._pdf(sample, p)

    def _get_likelihood(self, sample, c):

        p = []

        for i, feature in enumerate(sample):

            x = self.likelihood_[i]
            N = np.sum(self.likelihood_)
            d = len(sample)
            a = self.alpha

            prob = (x + a) / (N + (a * d))

            p.append(prob)

        return self._pdf(sample, p)

Updating

Updating the model means that given new data, the counts of features have to be updated.

    def _update_evidence(self, X):

        self.evidence_ += np.sum(X, axis=0)

        return self.evidence_

    def _update_likelihood(self, X, y):

        for i, c in enumerate(self.classes_):
            samples = X[y == c]   # only keep samples of class c

            self.likelihood_[i] += np.sum(samples, axis=0)

        return likelihood_