Cluzters.ai Association Rules and the Apriori Algorithm

Article Information

  • Posted By : Pranav B
  • Posted On : Feb 05, 2021
  • Views : 475
  • Category : Predictive and Prescriptive Modeling » Machine Learning
  • Description : Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases that contain transactions.

Overview

  • Association Rule Mining
    • Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables
    • It is often used by grocery stores, retailers, and anyone with a large transactional database.

    It's the same way that

            

    Association Rules

    Association rules are rules presenting association or correlation between itemsets

           

    Apriori Algorithm

    Mine frequent itemsets, association rules, or association using the Apriori algorithm. The Apriori algorithm employs a level-wise search for frequent itemsets.

    Example


    Association Rule Mining in Python

    Read the data and convert each row into a transaction

    import pandas as pd
    data = pd.read_csv('groceries - groceries.csv', na_values=" ")
    data = data.iloc[:, 1:]
    transactions = []
    for i in range(0, data.shape[0]):
       transactions.append([str(data.values[i,j]) for j in range(0, data.shape[1]) if pd.isna(data.iloc[i,j]) == False] )
    

    Apply apriori algorithm and convert the rules obtained into a list

    from apyori import apriori
    rules = apriori(transactions, min_support = 0.004, min_confidence = 0.2, min_lift = 3, min_length = 2)
    
    results = list(rules)
    

    Iterate through the results and create a data frame of support, lift, confidence, items, antecedent, consequent, count

    results_df = pd.DataFrame(columns=('Items','Antecedent','Consequent','Support','Confidence','Lift'))
    
    Support =[]
    Confidence = []
    Lift = []
    Items = []
    Antecedent = []
    Consequent=[]
    
    for RelationRecord in results:
       for ordered_stat in RelationRecord.ordered_statistics:
           Support.append(RelationRecord.support)
           Items.append(RelationRecord.items)
           Antecedent.append(ordered_stat.items_base)
           Consequent.append(ordered_stat.items_add)
           Confidence.append(ordered_stat.confidence)
           Lift.append(ordered_stat.lift)
    
    results_df['Items'] = list(map(set, Items))
    results_df['Antecedent'] = list(map(set, Antecedent))
    results_df['Consequent'] = list(map(set, Consequent))
    results_df['Support'] = Support
    results_df['Confidence'] = Confidence
    results_df['Lift']= Lift
    
    results_df.sort_values(by ='Confidence', ascending = False, inplace = True)
    results_df.reset_index(inplace=True, drop = True)
    results_df.head()
    

    Output:

                                               	Items  ...  	Lift
    0  {root vegetables, other vegetables, citrus fru...  ...  4.060694
    1  {root vegetables, other vegetables, citrus fru...  ...  3.273165
    2  {pip fruit, other vegetables, root vegetables,...  ...  3.171368
    3  {yogurt, other vegetables, root vegetables, tr...  ...  3.165495
    4  {pip fruit, other vegetables, whipped/sour cream}  ...  3.123610
    [5 rows x 6 columns]
    

    Apply the algorithm and get predictions on unseen data

    test_df = pd.read_csv('groceries_test_data.csv', header=None)
    test_df = test_df.iloc[:, 1:]
    test_list = list(test_df.iloc[135, :])
    test_list[0:5]
    test_X = test_list[0:2]
    
    predictions = results_df[pd.DataFrame(results_df.Antecedent.tolist()).iloc[:, 0:len(test_X)].isin(test_X).all(axis = 'columns')]
    predictions.shape
    predictions.reset_index(drop=True, inplace=True)
    predictions[['Consequent', "Confidence"]]
    

    Output:

          	Consequent  Confidence
    0  {other vegetables}	0.785714
    1  {other vegetables}	0.586207
    2	{tropical fruit}	0.321839
    

    for code file refer here:https://www.cluzters.ai/vault/274/1030/apriori-algorithm-code