Reviews

  • 1
  • 2
  • 3
  • 4
  • 5
Editor Rating
  • 1
  • 2
  • 3
  • 4
  • 5
User Ratings
Based on 0 reviews

Major Concepts

Bayesian Analysis

                                                                   Bayesian Analysis

Probability is a numerical description of how likely an event is to occur after many repeated trials.


If two or more events occurring simultaneously and If they are independent


P(outcome1 AND outcome2) =  P(outcome1) * P(outcome2)


If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.


P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)


Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)


Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent


P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)


                
               


P(Red) = ?


P(Red) = Total Red/Total Items = 60/180


 


P(Necklace) = ?


P(Necklace) = Total Necklace/Total Items = 60/180


 


P(Red Necklace) = ?


P(Red Necklace) = Total Red Necklace/Total Items = 30/180


 


Calculating Conditional Probability

            


 Bayesian Analysis in Python:


Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats


Import the libraries and load the dataset with pandas


import pandas as pd
import graphviz
from subprocess import call

 


from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"
data = pd.read_csv(Path)

Replace all the values above 4  in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns


data.loc[data.Sales > 4, 'Sale'] = 'Yes'
data.loc[data.Sales < 4, 'Sale'] = 'No'

class_names = data['Sale']
data = data.loc[:,data.columns != 'Sales']
data['Sale'],_ = pd.factorize(data['Sale'])
data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])
data['Urban'],_= pd.factorize(data['Urban'])
data['US'],_ = pd.factorize(data['US'])
data.info()

X = data.loc[:,data.columns != 'Sale']
Y = data.Sale

feature_names = X.columns

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

nb = GaussianNB()
# train the model
nb.fit(X_train, y_train)
# make class predictions for X_test
y_pred_class = nb.predict(X_test)
# calculate accuracy of class predictions
NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)
print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))

Output:


NB_Accuracy: 0.93

For code file refer here:https://www.cluzters.ai/vault/274/1029/classification-algorithms-code





User Reviews