Bayesian Analysis
Probability is a numerical description of how likely an event is to occur after many repeated trials.
If two or more events occurring simultaneously and If they are independent
P(outcome1 AND outcome2) = P(outcome1) * P(outcome2)
If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.
P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)
Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)
Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent
P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)
P(Red) = ?
P(Red) = Total Red/Total Items = 60/180
P(Necklace) = ?
P(Necklace) = Total Necklace/Total Items = 60/180
P(Red Necklace) = ?
P(Red Necklace) = Total Red Necklace/Total Items = 30/180
Calculating Conditional Probability
Bayesian Analysis in Python:
Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats
Import the libraries and load the dataset with pandas
import pandas as pd
import graphviz
from subprocess import call
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"
data = pd.read_csv(Path)
Replace all the values above 4 in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns
data.loc[data.Sales > 4, 'Sale'] = 'Yes'
data.loc[data.Sales < 4, 'Sale'] = 'No'
class_names = data['Sale']
data = data.loc[:,data.columns != 'Sales']
data['Sale'],_ = pd.factorize(data['Sale'])
data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])
data['Urban'],_= pd.factorize(data['Urban'])
data['US'],_ = pd.factorize(data['US'])
data.info()
X = data.loc[:,data.columns != 'Sale']
Y = data.Sale
feature_names = X.columns
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
nb = GaussianNB()
# train the model
nb.fit(X_train, y_train)
# make class predictions for X_test
y_pred_class = nb.predict(X_test)
# calculate accuracy of class predictions
NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)
print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))
Output:
NB_Accuracy: 0.93
For code file refer here:https://www.cluzters.ai/vault/274/1029/classification-algorithms-code