It's the same way that
Association Rules
Association rules are rules presenting association or correlation between itemsets
Apriori Algorithm
Mine frequent itemsets, association rules, or association using the Apriori algorithm. The Apriori algorithm employs a level-wise search for frequent itemsets.
Example
Association Rule Mining in Python
Read the data and convert each row into a transaction
import pandas as pd
data = pd.read_csv('groceries - groceries.csv', na_values=" ")
data = data.iloc[:, 1:]
transactions = []
for i in range(0, data.shape[0]):
transactions.append([str(data.values[i,j]) for j in range(0, data.shape[1]) if pd.isna(data.iloc[i,j]) == False] )
Apply apriori algorithm and convert the rules obtained into a list
from apyori import apriori
rules = apriori(transactions, min_support = 0.004, min_confidence = 0.2, min_lift = 3, min_length = 2)
results = list(rules)
Iterate through the results and create a data frame of support, lift, confidence, items, antecedent, consequent, count
results_df = pd.DataFrame(columns=('Items','Antecedent','Consequent','Support','Confidence','Lift'))
Support =[]
Confidence = []
Lift = []
Items = []
Antecedent = []
Consequent=[]
for RelationRecord in results:
for ordered_stat in RelationRecord.ordered_statistics:
Support.append(RelationRecord.support)
Items.append(RelationRecord.items)
Antecedent.append(ordered_stat.items_base)
Consequent.append(ordered_stat.items_add)
Confidence.append(ordered_stat.confidence)
Lift.append(ordered_stat.lift)
results_df['Items'] = list(map(set, Items))
results_df['Antecedent'] = list(map(set, Antecedent))
results_df['Consequent'] = list(map(set, Consequent))
results_df['Support'] = Support
results_df['Confidence'] = Confidence
results_df['Lift']= Lift
results_df.sort_values(by ='Confidence', ascending = False, inplace = True)
results_df.reset_index(inplace=True, drop = True)
results_df.head()
Output:
Items ... Lift
0 {root vegetables, other vegetables, citrus fru... ... 4.060694
1 {root vegetables, other vegetables, citrus fru... ... 3.273165
2 {pip fruit, other vegetables, root vegetables,... ... 3.171368
3 {yogurt, other vegetables, root vegetables, tr... ... 3.165495
4 {pip fruit, other vegetables, whipped/sour cream} ... 3.123610
[5 rows x 6 columns]
Apply the algorithm and get predictions on unseen data
test_df = pd.read_csv('groceries_test_data.csv', header=None)
test_df = test_df.iloc[:, 1:]
test_list = list(test_df.iloc[135, :])
test_list[0:5]
test_X = test_list[0:2]
predictions = results_df[pd.DataFrame(results_df.Antecedent.tolist()).iloc[:, 0:len(test_X)].isin(test_X).all(axis = 'columns')]
predictions.shape
predictions.reset_index(drop=True, inplace=True)
predictions[['Consequent', "Confidence"]]
Output:
Consequent Confidence
0 {other vegetables} 0.785714
1 {other vegetables} 0.586207
2 {tropical fruit} 0.321839
for code file refer here:https://www.cluzters.ai/vault/274/1030/apriori-algorithm-code