Credit Card Customers Churn Prediction

Prathmesh Sardeshmukh

Related Listings

CNN Brain Tumor Detec...

0 comments, 3 reviews , 1 like
Detecting Of Diabetes...

0 comments, 2 reviews , 3 likes

Predicting Hospital Length of Stay

0 comments, 1 review , 528 views, 1 like

Major Concepts

Models Home » Domain Usecases » Banking and Financial Services » Credit Card Customers Churn Prediction

Credit Card Customers Churn Prediction

Models Status

Model Overview

Problem statement:

Customers leaving the bank credit card services is the most common issue found by banks and in order to tackle this issue they need to know which customer will get churned so in order to that they can take necessary actions to prevent that customer from being getting churned.

Here, we are given with a dataset where we have to predict whether the customer is going to leave the credit card service, if they are going to leave then we then we must predict them and show the bank manager about them and they would take some necessary actions to prevent them from leaving their bank.

Who can use it?

This use case can used by Banks to check if their customer is leaving credit card services or not.

Data Source:

I found this dataset on Kaggle, link is below,

https://www.kaggle.com/sakshigoyal7/credit-card-customers

Data-set Description:

This dataset consists of 10000 customers mentioning their age, salary, marital_status , credit card limit, credit card category, etc and total 18 features were present.

Target Column was our "Attrition Flag" column in that it had two classes Existing Customer, Attrited Customers. This column was imbalanced as it had maximum number of existing customers and less number of Attrited Customers.

[Existing Customers: 8500 ; Attrited Customers: 1627]

Detailed Description about the columns:

CLIENTNUM: Client number. Unique identifier for the customer holding the account

Attrition_Flag: Internal event (customer activity) variable – if the account is closed then 1 else 0

Customer_Age: Demographic variable – Customer’s Age in Years

Gender: M=Male, F=Female

Dependent_count: Number of dependents

Education_Level: Educational Qualification of the account holder (example: high school, college graduate, etc.)

Marital_Status: Married, Single, Divorced, Unknown

Income_Category: Annual Income Category of the account holder (< 40K, 40K – 60K, 60K− 80K, 80K− 120K, > $120K, Unknown)

Card_Category: Product Variable – Type of Card (Blue, Silver, Gold, Platinum)

Months_on_book: Period of relationship with bank

Total_Relationship_Count: Total no. of products held by the customer

Months_Inactive_12_mon: No. of months inactive in the last 12 months

Contacts_Count_12_mon: No. of Contacts in the last 12 months

Credit_Limit: Credit Limit on the Credit Card

Total_Revolving_Bal: Total Revolving Balance on the Credit Card

Avg_Open_To_Buy: Open to Buy Credit Line (Average of last 12 months)

Total_Amt_Chng_Q4_Q1: Change in Transaction Amount (Q4 over Q1)

Total_Trans_Amt: Total Transaction Amount (Last 12 months)

Total_Ct_Chng_Q4_Q1: Change in Transaction Count (Q4 over Q1)

Avg_Utilization_Ratio: Average Card Utilization Ratio

Data Pre-processing:

We will be dropping the CLIENTNUM column as it doesn’t provide any importance while predicting our target value.

We will create a new column “Churn” derived from Attrition flag column where 0 will represent Existing customer and 1 will represent attrited customer.

After plotting the correlation plot there was Correlation among some columns i.e Customer_Age and Months_on_books ; Total_Trans_Amt and Total_Trans_Ct; Credit_Limit and Avg_Open_To_Buy;

To handle this, we have to combine these columns into a single meaningful column.

So, we can get a column named Royalty after dividing the Customer age by Months_on_books. It states the amount of time customer had relationship with the bank.

df1['Royalty'] = df1['Customer_Age']/ df1['Months_on_book']

Again, "Avg_Spend_per_yr" column can be created by dividing Total Transaction Amount by Total Transaction Count.

df1['Avg_Spend_per_yr'] = df1['Total_Trans_Amt']/ df1['Total_Trans_Ct']

Income Category Column had data which needed to be cleaned so, we replaced the original data in income category with cleaned data

def preprocess_income(a):

    a['Income_Category'] = a['Income_Category'].replace('Less than $40K', '< 40K')

    a['Income_Category'] = a['Income_Category'].replace('$40K - $60K', '40K - 60K')

    a['Income_Category'] = a['Income_Category'].replace('$60K - $80K', '60K - 80K')

    a['Income_Category'] = a['Income_Category'].replace('$80K - $120K', '80K - 120K')

    a['Income_Category'] = a['Income_Category'].replace('$120K +', '>120K')

    a['Income_Category'] = a['Income_Category'].replace('Unknown', np.nan)



preprocess_income(final_df)

Education_Level, Marital_status, Income_Category had a variable called “Unknown” in them so to deal with it, we will replace it with nan and map the categorical columns and after encoding them.

final_df['Education_Level'] = final_df['Education_Level'].replace('Unknown', np.nan)

final_df['Marital_Status'] = final_df['Marital_Status'].replace('Unknown', np.nan)

# processsing the categorical(ordinal) columns coverting it into numerical type



income_map = {

    '< 40K':0,

    '40K - 60K':1,

    '60K - 80K':2,

    '80K - 120K':3,

    '>120K':4,

}



Cardmap = {

    'Blue'     : 0,

    'Silver'   : 1,

    'Gold'     : 2,

    'Platinum' : 3

}



Education_map= {

    'Uneducated'    : 0,

    'High School'   : 1,

    'College'       : 2,

    'Graduate'      : 3,

    'Post-Graduate' : 4,

    'Doctorate'     : 5,

}



marital = {

    'Married':2,

    'Single':1,

    'Divorced':0

}



final_df["Income_Category"] = final_df["Income_Category"].map(income_map)

final_df["Card_Category"] = final_df["Card_Category"].map(Cardmap)

final_df["Education_Level"] = final_df["Education_Level"].map(Education_map)

final_df['Marital_Status'] = final_df['Marital_Status'].map(marital)

Now, imputing the nan values with the help of KNN imputer so as we will get the predictive value of Unknown variables.

from sklearn.impute import KNNImputer



# transforming our df

knn = KNNImputer()

df_imputed = knn.fit_transform(final_df)

After that we will perform standard scaling on the columns.

# applying standard scaler

ss = StandardScaler()

X_train = ss.fit_transform(X_train)

X_test = ss.transform(X_test)

After splitting the data and looking on target variable we found that the target was imbalanced so using SMOTE we balanced the data and then gave this balanced data to our models for training.

Models Evaluated:

I have evaluated the dataset with 7 models:

Logistic Regression:

The Logistic Regression makes use of Sigmoid Function, which takes any real value between 0 and 1. It is used to predict the relationship between the dependent variable and independent variable, where dependent variable is binary in nature.EG. Success /Failure, Yes/No, True/False.

Sigmoid Function is :

Accuracy of Logistic Regression: 76 %, F1-score: 48.47%

Decision Tree Classifier

The general motive of using Decision Tree is to create a training model which can use to predict class or value of target variables by learning decision rules inferred from prior data training data). I this model we have used entropy as function which uses Information Gain as it Metrics. ID3 is the Algorithm in Decision Tree which uses Entropy and information gain.

Where, n=Number of Classes

Accuracy of Decision Tree Classifier: 85%, F1-score: 57.54%

Random Forest Classifier

Ensemble learning is a type of learning where one can combine different algorithms or the same algorithm many times to get a powerful prediction model. Random forest tends to combine hundreds of decision trees and then trains each decision tree on a different sample of the observations. This concept is called as “bagging" and is very popular for its ability to reduce variance and overfitting.

Accuracy of Random Forest Classifier: 91%, F1-score: 69.12%

SVC

In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n = number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.

Accuracy of SVC: 86%, F1-score: 61.33%

Gradient Boosting Classifier

In gradient boosting, each new model minimizes the loss function from its predecessor using the Gradient Descent Method. This procedure continues until a more optimal estimate of the target variable has been achieved. Classification algorithms frequently use logarithmic loss function whereas regression algorithms use squared errors. They are many standard loss functions supported by gradient boosting classifiers on the condition that the loss function should be differentiable.

As more learners are added into the model, the output of trees can be added together to minimize the loss in the prediction. This process employs a similar procedure to gradient descent on the calculated loss, thereby reducing the loss iteratively.

Accuracy of Gradient Boosting Classifier: 81.86%, F1-score:72.06%

XGBoost Classifier

XGBoost stands for eXtreme Gradient Boosting. It is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. It's objective function is a sum of a specific loss function evaluated over all predictions and a sum of regularization term for all predictors (KK trees).

Accuracy of XGBoost Classifier: 80.34 % , F1-score: 68.84%

LGBM Classifier:

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. LightGBM grows tree vertically while other tree based learning algorithms grow trees horizontally. It means that LightGBM grows tree leaf-wise while other algorithms grow level-wise. It will choose the leaf with max delta loss to grow. When growing the same leaf, leaf-wise algorithm can reduce more loss than a level-wise algorithm.

The key difference in speed is because XGBoost split the tree nodes one level at a time and LightGBM does that one node at a time.

LightGBM Tree Leaf Growth:

Other Algorithm Level Growth:

Model Used:

I have chosen Gradient Boosting Classifier as a final model for prediction . I trained this model on the training dataset on which I have applied SMOTE and after that I have checked the model efficiency, so as compared to all other models this model was performing best in terms of accuracy , F1-score and Confusion Metrics. This model was giving the less number of False Negatives.

gb = GradientBoostingClassifier(n_estimators=500, learning_rate=0.1, max_depth=10, random_state=120)

gb.fit(smote_X_train, smote_y_train)

print("Classification Report for Gradient Boosting Classifier")

print('-'*50)



print(classification_report(y_test, gb.predict(X_test)))



print('-'*50)

print(f1_score(y_test, gb.predict(X_test)))

Classification Report for Gradient Boosting Classifier

-------------------------------------------------------

              precision    recall  f1-score   support



         0.0       0.94      0.96      0.95      1703

         1.0       0.77      0.67      0.72       323



    accuracy                           0.92      2026

   macro avg       0.86      0.82      0.84      2026

weighted avg       0.91      0.92      0.91      2026



-------------------------------------------------------

0.7206611570247934

Solution Efficiency:

For the Gradient Boosting Classifier, the accuracy for testing data was 91% and F1-score: 81.86%

0 comments

Bhavik Jikadara, Viaan Prakash, and Prathmesh Sardeshmukh like this

Related Listings

Prathmesh Sardeshmukh's other Models Reports

Major Concepts

Credit Card Customers Churn Prediction

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Prathmesh Sardeshmukh's other Models Reports

Major Concepts

Credit Card Customers Churn Prediction

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us