Cluzters.ai

Cluzters.ai Bayesian Analysis

Take Print

Article Information

Posted By : Raji Reddy A
Posted On : Feb 05, 2021
Views : 416
Category : Predictive and Prescriptive Modeling » Machine Learning
Description : Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics

Overview

Bayesian Analysis

Probability is a numerical description of how likely an event is to occur after many repeated trials.

If two or more events occurring simultaneously and If they are independent

P(outcome1 AND outcome2) = P(outcome1) * P(outcome2)

If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.

P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)

Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)

Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent

P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)

P(Red) = ?

P(Red) = Total Red/Total Items = 60/180

P(Necklace) = ?

P(Necklace) = Total Necklace/Total Items = 60/180

P(Red Necklace) = ?

P(Red Necklace) = Total Red Necklace/Total Items = 30/180

Calculating Conditional Probability

Bayesian Analysis in Python:

Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats

Import the libraries and load the dataset with pandas
```
import pandas as pd
import graphviz
from subprocess import call
```
```
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"
data = pd.read_csv(Path)
```
Replace all the values above 4 in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns
```
data.loc[data.Sales > 4, 'Sale'] = 'Yes'
data.loc[data.Sales < 4, 'Sale'] = 'No'

class_names = data['Sale']
data = data.loc[:,data.columns != 'Sales']
data['Sale'],_ = pd.factorize(data['Sale'])
data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])
data['Urban'],_= pd.factorize(data['Urban'])
data['US'],_ = pd.factorize(data['US'])
data.info()

X = data.loc[:,data.columns != 'Sale']
Y = data.Sale

feature_names = X.columns
```
```
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

nb = GaussianNB()
# train the model
nb.fit(X_train, y_train)
# make class predictions for X_test
y_pred_class = nb.predict(X_test)
# calculate accuracy of class predictions
NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)
print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))
```
Output:
```
NB_Accuracy: 0.93
```
For code file refer here:https://www.cluzters.ai/vault/274/1029/classification-algorithms-code