Naive Bayes

Multinomial Naive Bayes

Andrew Mccallum and Kamal Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on ’Learning for Text Categorization’, 1998

General Description

Multinomial Naive Bayes models a document as a bag-of-words. For each class $c$, $ \Pr(w|c)$ (the probability of observing word $w$ given $c$) is estimated from the training data, simply by computing the relative frequency of each word in the collection of training documents, for that class. The classifier also requires the prior probability $\Pr(c)$, which is straightforward to estimate from the frequency of classes in the training set.

Assuming $n_{wd}$ is the number of times word $w$ occurs in document $d$, the probability of class $c$ given a test document is calculated as follows: $$\Pr(c|d) =\frac{\Pr(c) \prod_{w\in d} \Pr(w|c)^{n_{wd}}}{\Pr(d)},$$ where $\Pr(d)$ is a normalization factor. To avoid the zero frequency problem, it is common to use the Laplace correction for all conditional probabilities involved, which means all counts are initialized with a value of 1 instead of 0.

Implementation

In StreamDM, we have implemented the offline Multinomial Naive Bayes, for use in the other online algorithms implemented. The model is handled by the MultinomialNaiveBayesModel class, which keeps the class and feature statistics. The main algorithm is implemented in MultinomialNaiveBayes, which is controlled by the following options:

Number of features (-f), which sets the number of features in the input examples (3 by default);
Number of classes (-c), which sets the number of classes in the input examples (2 by default); and
Laplace smoothing (-s), which sets the smoothing factor for handling the zero frequency issue (1 by default).