
  • 1
  • 2
  • 3
  • 4
  • 5
Editor Rating
  • 1
  • 2
  • 3
  • 4
  • 5
User Ratings
Based on 0 reviews

Major Concepts

Bayesian Analysis

                                                                   Bayesian Analysis

Probability is a numerical description of how likely an event is to occur after many repeated trials.

If two or more events occurring simultaneously and If they are independent

P(outcome1 AND outcome2) =  P(outcome1) * P(outcome2)

If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.

P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)

Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)

Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent

P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)


P(Red) = ?

P(Red) = Total Red/Total Items = 60/180


P(Necklace) = ?

P(Necklace) = Total Necklace/Total Items = 60/180


P(Red Necklace) = ?

P(Red Necklace) = Total Red Necklace/Total Items = 30/180


Calculating Conditional Probability


 Bayesian Analysis in Python:

Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats

Import the libraries and load the dataset with pandas

import pandas as pd
import graphviz
from subprocess import call


from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"
data = pd.read_csv(Path)

Replace all the values above 4  in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns

data.loc[data.Sales > 4, 'Sale'] = 'Yes'
data.loc[data.Sales < 4, 'Sale'] = 'No'

class_names = data['Sale']
data = data.loc[:,data.columns != 'Sales']
data['Sale'],_ = pd.factorize(data['Sale'])
data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])
data['Urban'],_= pd.factorize(data['Urban'])
data['US'],_ = pd.factorize(data['US'])

X = data.loc[:,data.columns != 'Sale']
Y = data.Sale

feature_names = X.columns

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

nb = GaussianNB()
# train the model, y_train)
# make class predictions for X_test
y_pred_class = nb.predict(X_test)
# calculate accuracy of class predictions
NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)
print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))


NB_Accuracy: 0.93

For code file refer here:

User Reviews