# Chapter 5: Multinomial Naïve Bayes

In certain situations, using Bernoulli is not enough. Indeed, sometimes, knowing how many times a feature appears, as opposed to whether it appears at all is necessary.

It is important to note that in some cases, even though it might seem non-intuitive, Bernoulli outperforms Multinomial. It is likely due to the fact that a Multinomial distribution is only affects by the features that do appear. Whereas Bernoulli also takes the absence of features into account.

from math import factorial as fact from collections import Counter import scipy.stats as ss import numpy as np from supervised.nb_classifier import NBClassifier class MultinomialNB(NBClassifier): def __init__(self, alpha=1.0): super().__init__() self.alpha = alpha def _pdf(self, x, p): f = fact(np.sum(x)) for P, X in zip(p, x): f *= (P**X) / fact(X) return f

## Fitting

A Multinomial distribution is parameterized by a single parameter, which depends on the number of times each feature occurs.

def _fit_evidence(self, X): evidence_ = np.sum(X, axis=0) return evidence_

Fitting the likelihood then becomes trivial, as it is similar to fitting the evidence for each class.

def _fit_likelihood(self, X, y): likelihood_ = [] for c in self.classes_: samples = X[y == c] # only keep samples of class c likelihood_.append(self._fit_evidence(samples)) return likelihood_

## Getting

Assuming that our model is trained, we need to be able to make use of its state in order to compute the evidence and likelihood. We can then reuse the * _pdf* that was defined at the beginning.

The ** alpha** parameter of the model is there to parameterize the additive smoothing. Additive, or Laplace, smoothing helps maintaining non-zero probabilities. No smoothing means keeping the original probability estimate, and maximum smoothing means assuming all probabilities are uniform.

def _get_evidence(self, sample): p = [] for i, feature in enumerate(sample): x = self.evidence_[i] N = np.sum(self.evidence_) d = len(sample) a = self.alpha prob = (x + a) / (N + (a * d)) p.append(prob) return self._pdf(sample, p) def _get_likelihood(self, sample, c): p = [] for i, feature in enumerate(sample): x = self.likelihood_[i] N = np.sum(self.likelihood_) d = len(sample) a = self.alpha prob = (x + a) / (N + (a * d)) p.append(prob) return self._pdf(sample, p)

## Updating

Updating the model means that given new data, the counts of features have to be updated.

def _update_evidence(self, X): self.evidence_ += np.sum(X, axis=0) return self.evidence_ def _update_likelihood(self, X, y): for i, c in enumerate(self.classes_): samples = X[y == c] # only keep samples of class c self.likelihood_[i] += np.sum(samples, axis=0) return likelihood_