# Naive Bayes in it’s easiest form

--

**Introduction**

Naïve Bayes is a probabilistic supervised machine learning algorithm based on bayes theorem. It is used in used in various classification tasks but mainly used in text classification that includes a higher dimensional training data set . since it is a probabilistic classifier , it predicts on the basis of the probability of an object.

**Bayes theorem**

Bayes’ theorem is also known as *Bayes’ Rule* or *Bayes’ law*, which is used to determine the probability of a hypothesis with prior knowledge. It depends on the conditional probability.

**Conditional probability** is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred.

The formula for Bayes theorem is :

**P(A|B) is Posterior probability**: Probability of hypothesis A on the observed event B.

**P(B|A) is Likelihood probability**: Probability of the evidence given that the probability of a hypothesis is true.

**P(A) is Prior Probability:** Probability of hypothesis before observing the evidence.

**P(B) is Marginal Probability**: Probability of Evidence.

In simpler terms, Bayes’ Theorem is a way of finding a probability when we know certain other probabilities.

**How Naïve Bayes algorithm works**

To start with let us consider a fictional data set.

Consider the car theft problem with attributes Color, Type, Origin, and the target, Stolen can be either Yes or No

**Assumptions**

The fundamental Naive Bayes assumption is that each feature makes an:

**1. independent**

**2. equal**

contribution to the outcome. Which means

1. No pair of pair features are dependent. i.e-the color being ‘Red’ has nothing to do with the Type or the Origin of the car. Hence, the features are assumed to be *Independent*.

2. Each feature is given the same influence. i.e-knowing the only type and origin alone can’t predict the outcome perfectly. So none of the attributes are irrelevant and assumed to be contributing *Equally* to the outcome.

**Note**: these assumptions are generally not correct in real world situation. The first assumption of **independence **is never correct but often works well in practice. That is why the name is **Naïve**.

Now , given the features of the car our task is to classify

Whether the car is stolen or not.

The columns represent these features and the rows represent individual entries. If we take the first row of the dataset, we can observe that the car is stolen if the Color is Red, the Type is Sports and Origin is Domestic. So we want to classify a Red Domestic SUV is getting stolen or not. Note that there is no example of a Red Domestic SUV in our data set.

According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable and X is a dependent feature vector (of size n ) . where

Here x1, x2,…,xn represents the features.

By substituting for X and expanding using the chain rule we get,

Since the denominator remains constant. It can be written as

**For this case our class variable has only two outcomes. In case of more than two possible outcomes we have to find the the class variable with maximum probability**

The posterior probability P(y|X) can be calculated by first creating frequency table for each feature against the target and then we have to create the likelihood table by calculating the probabilities and finally calculating the naive bayesian to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of our prediction.

**Frequency table for color:**

**Likelihood table**

**Frequency and likelihood table for type :**

**Frequency and likelihood table for origin :**

Now in our example we have three predictor X :

From the equation discussed above we can calculate the posterior probability of yes

And P(no|X) :

Since 0.072 > 0.024 , our example is classified as ‘NO’ the car is not stolen .

**Advantages of Naïve Bayes Classifier:**

- . It is one of the fast and easy ML algorithm to to predict the class of a data set.
- . It performs well in multi class prediction
- When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
- . It perform well in case of categorical input variables compared to numerical variable(s).

**Disadvantages of Naïve Bayes Classifier:**

- . If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
- Another disadvantage of naive bayes is assumption of independent predictors. In real life it is very hard to find independent predictors.

**Applications of Naïve Bayes classifier :**

There are three types of Naive Bayes Model, which are given below:

**Gaussian**: The Gaussian model assumes that features follow a normal distribution. This means if predictors take continuous values instead of discrete, then the model assumes that these values are sampled from the Gaussian distribution.

**Multinomial**: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, it means a particular document belongs to which category such as Sports, Politics, education, etc.

The classifier uses the frequency of words for the predictors.

**Bernoulli**: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a particular word is present or not in a document. This model is also famous for document classification tasks.

If there is something wrong or you have suggestion for me please reach out to me at these

Thank you.