Bayesian Analysis

Raji Reddy A

Reviews

Editor Rating

User Ratings

Based on 0 reviews

Major Concepts

Articles Home » Predictive and Prescriptive Modeling » Machine Learning » Bayesian Analysis

Bayesian Analysis

Probability is a numerical description of how likely an event is to occur after many repeated trials.

If two or more events occurring simultaneously and If they are independent

P(outcome1 AND outcome2) = P(outcome1) * P(outcome2)

If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.

P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)

Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)

Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent

P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)

P(Red) = ?

P(Red) = Total Red/Total Items = 60/180

P(Necklace) = ?

P(Necklace) = Total Necklace/Total Items = 60/180

P(Red Necklace) = ?

P(Red Necklace) = Total Red Necklace/Total Items = 30/180

Calculating Conditional Probability

Bayesian Analysis in Python:

Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats

Import the libraries and load the dataset with pandas

import pandas as pd

import graphviz

from subprocess import call

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB



Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"

data = pd.read_csv(Path)

Replace all the values above 4 in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns

data.loc[data.Sales > 4, 'Sale'] = 'Yes'

data.loc[data.Sales < 4, 'Sale'] = 'No'



class_names = data['Sale']

data = data.loc[:,data.columns != 'Sales']

data['Sale'],_ = pd.factorize(data['Sale'])

data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])

data['Urban'],_= pd.factorize(data['Urban'])

data['US'],_ = pd.factorize(data['US'])

data.info()



X = data.loc[:,data.columns != 'Sale']

Y = data.Sale



feature_names = X.columns

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)



nb = GaussianNB()

# train the model

nb.fit(X_train, y_train)

# make class predictions for X_test

y_pred_class = nb.predict(X_test)

# calculate accuracy of class predictions

NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)

# print the confusion matrix

metrics.confusion_matrix(y_test, y_pred_class)

print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))

Output:

NB_Accuracy: 0.93

For code file refer here:https://www.cluzters.ai/vault/274/1029/classification-algorithms-code

User Reviews

Member Sign In

Member Sign In

Create Account

Reviews

Major Concepts

Bayesian Analysis

User Reviews

Connect With Us