Cluzters.ai Bayesian Analysis

Article Information

  • Posted By : Raji Reddy A
  • Posted On : Feb 05, 2021
  • Views : 416
  • Category : Predictive and Prescriptive Modeling » Machine Learning
  • Description : Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics

Overview

  •                                                                    Bayesian Analysis

    Probability is a numerical description of how likely an event is to occur after many repeated trials.

    If two or more events occurring simultaneously and If they are independent

    P(outcome1 AND outcome2) =  P(outcome1) * P(outcome2)

    If two or more events occur simultaneously and If they are dependent, P(outcome 2 I outcome 1) refers to the conditional probability of outcome 2 occurring given outcome 1 has already occurred.

    P(outcome 1 AND outcome 2) = P(outcome1) * P(outcome 2 | outcome 1)

    Probability of either one event or the other occurring and If they are independent P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2)

    Probability of either one event or the other occurring (both happen simultaneously) and If they are dependent

    P(outcome 1 or outcome 2) = P(outcome 1) + P(outcome 2) - P(outcome1 AND outcome2)


                    
                   

    P(Red) = ?

    P(Red) = Total Red/Total Items = 60/180

     

    P(Necklace) = ?

    P(Necklace) = Total Necklace/Total Items = 60/180

     

    P(Red Necklace) = ?

    P(Red Necklace) = Total Red Necklace/Total Items = 30/180

     

    Calculating Conditional Probability

                

     Bayesian Analysis in Python:

    Consider Carseats data on which we apply the above classification algorithms to predict the sales of carseats

    Import the libraries and load the dataset with pandas

    import pandas as pd
    import graphviz
    from subprocess import call
    

     

    from sklearn.model_selection import train_test_split
    from sklearn.naive_bayes import GaussianNB
    
    Path = "D:\\DSA Course\\Datasets\\R Inbuilt Datasets\\Carseats.csv"
    data = pd.read_csv(Path)
    

    Replace all the values above 4  in the Sales column to ‘Yes’ and below ‘No’ as this will be our target column to predict. Then separate the features and label columns, and factorize categorical columns

    data.loc[data.Sales > 4, 'Sale'] = 'Yes'
    data.loc[data.Sales < 4, 'Sale'] = 'No'
    
    class_names = data['Sale']
    data = data.loc[:,data.columns != 'Sales']
    data['Sale'],_ = pd.factorize(data['Sale'])
    data['ShelveLoc'],_ = pd.factorize(data['ShelveLoc'])
    data['Urban'],_= pd.factorize(data['Urban'])
    data['US'],_ = pd.factorize(data['US'])
    data.info()
    
    X = data.loc[:,data.columns != 'Sale']
    Y = data.Sale
    
    feature_names = X.columns
    
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
    
    nb = GaussianNB()
    # train the model
    nb.fit(X_train, y_train)
    # make class predictions for X_test
    y_pred_class = nb.predict(X_test)
    # calculate accuracy of class predictions
    NB_Accuracy = metrics.accuracy_score(y_test, y_pred_class)
    # print the confusion matrix
    metrics.confusion_matrix(y_test, y_pred_class)
    print('NB_Accuracy: {:.2f}'.format(NB_Accuracy))
    

    Output:

    NB_Accuracy: 0.93

    For code file refer here:https://www.cluzters.ai/vault/274/1029/classification-algorithms-code