Nitara Bobal's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Banking and Financial Services » Bank Customer Churn Prediction using Machine Learning

Bank Customer Churn Prediction using Machine Learning

Models Status

Model Overview

Bank Customer Churn Prediction

One of the use cases of machine learning in banking and finance is customer churn prediction. Churn prediction models are used to predict which consumers will close their accounts with the bank and switch to another bank. We can use customer data to be able to predict if a customer will churn or not. If we are able to identify this in time, we will be able to put in effort to make the customer stay, thus improving customer retention and profits.

In this use case, we will be building a banking customer churn classifier model, which will go through a list of bank customers, and predict which customers will churn, or quit the bank, in the near future.

Let's move on to the code.

Importing the libraries

import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt

Importing the dataset


data = pd.read_csv('/home/sai/Desktop/Cluzters BFS Usecases/Churn-Modelling-Dataset-master/Churn_Modelling.csv')

data.head(5)


Exploring the dataset


data.describe()


data.tail()


Let's check if our dataset contains any NULL values


# Checking if our dataset contains any NULL values

data.isnull().sum()

RowNumber          0
CustomerId 0
Surname 0
CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64

There are no null values in our dataset

Data Analysis

Checking out the properties of the 'Gender' column


data['Gender'].value_counts()

Male      5457
Female 4543
Name: Gender, dtype: int64

# Plotting the features of the dataset to see the correlation between them

plt.hist(x = data.Gender, bins = 3, color = 'pink')
plt.title('comparison of male and female')
plt.xlabel('Gender')
plt.ylabel('population')
plt.show()



Checking out the properties of the Age Column


data['Age'].value_counts()

37    478
38 477
35 474
36 456
34 447
...
92 2
88 1
82 1
85 1
83 1
Name: Age, Length: 70, dtype: int64

# comparison of age in the dataset

plt.hist(x = data.Age, bins = 10, color = 'orange')
plt.title('comparison of Age')
plt.xlabel('Age')
plt.ylabel('population')
plt.show()



Checking out the properties of the 'Geography' Column


data['Geography'].value_counts()

France     5014
Germany 2509
Spain 2477
Name: Geography, dtype: int64

# comparison of geography

plt.hist(x = data.Geography, bins = 5, color = 'green')
plt.title('comparison of Geography')
plt.xlabel('Geography')
plt.ylabel('population')
plt.show()

 
Checking out the properties of the 'HasCrCard' Column


data['HasCrCard'].value_counts()

1    7055
0 2945
Name: HasCrCard, dtype: int64

# comparision of how many customers hold the credit card

plt.hist(x = data.HasCrCard, bins = 3, color = 'red')
plt.title('how many people have or not have the credit card')
plt.xlabel('customers holding credit card')
plt.ylabel('population')
plt.show()


Checking out the properties of the 'IsActiveMember' Column. Active Members are less likely to churn


data['IsActiveMember'].value_counts()

1    5151
0 4849
Name: IsActiveMember, dtype: int64

# How many active member does the bank have ?

plt.hist(x = data.IsActiveMember, bins = 3, color = 'brown')
plt.title('Active Members')
plt.xlabel('Customers')
plt.ylabel('population')
plt.show()



Breakup of Gender based on Geography


 


# comparison between Geography and Gender

Gender = pd.crosstab(data['Gender'],data['Geography'])
Gender.div(Gender.sum(1).astype(float), axis=0).plot(kind="bar", stacked=True, figsize=(6, 6))

 

Breakup of Card Holders based on Geography


# comparison between geography and card holders

HasCrCard = pd.crosstab(data['HasCrCard'], data['Geography'])
HasCrCard.div(HasCrCard.sum(1).astype(float), axis = 0).plot(kind = 'bar',
stacked = True,figsize = (6, 6))



Comparing ages in different geographies


# comparing ages in different geographies

Age = pd.crosstab(data['Age'], data['Geography'])
Age.div(Age.sum(1).astype(float), axis = 0).plot(kind = 'bar',
stacked = True, figsize = (15,15))




Calculating total balance in France, Germany and Spain



# calculating total balance in france, germany and spain

total_france = data.Balance[data.Geography == 'France'].sum()
total_germany = data.Balance[data.Geography == 'Germany'].sum()
total_spain = data.Balance[data.Geography == 'Spain'].sum()

print("Total Balance in France :",total_france)
print("Total Balance in Germany :",total_germany)
print("Total Balance in Spain :",total_spain)

Total Balance in France : 311332479.49
Total Balance in Germany : 300402861.38
Total Balance in Spain : 153123552.01

# plotting a pie chart

labels = 'France', 'Germany', 'Spain'
colors = ['cyan', 'magenta', 'orange']
sizes = [311, 300, 153]
explode = [ 0.01, 0.01, 0.01]

plt.pie(sizes, colors = colors, labels = labels, explode = explode, shadow = True)

plt.axis('equal')
plt.show()



Data preprocessing


# Removing the unnecassary features from the dataset

data = data.drop(['CustomerId', 'Surname', 'RowNumber'], axis = 1)



print(data.columns)

Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
'Exited'],
dtype='object')

data.shape

(10000, 11)

# splitting the dataset into x(independent variables) and y(dependent variables)

x = data.iloc[:,0:10]
y = data.iloc[:,10]

print(x.shape)
print(y.shape)

print(x.columns)
print(y)

(10000, 10)
(10000,)
Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary'],
dtype='object')
0 1
1 0
2 1
3 0
4 0
..
9995 0
9996 0
9997 1
9998 1
9999 0
Name: Exited, Length: 10000, dtype: int64

# Encoding Categorical variables into numerical variables
# One Hot Encoding

x = pd.get_dummies(x)

x.head()




x.shape

(10000, 13)


Splitting the data into training and testing set


# splitting the data into training and testing set

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(7500, 13)
(7500,)
(2500, 13)
(2500,)


Scaling features between -1 and +1


# Feature Scaling 
# Only on Independent Variable to convert them into values ranging from -1 to +1

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)

x_train = pd.DataFrame(x_train)
x_train.head()



Machine Learning Models

Decision Tree


from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix

model = DecisionTreeClassifier()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print("Training Accuracy :", model.score(x_train, y_train))
print("Testing Accuaracy :", model.score(x_test, y_test))

cm = confusion_matrix(y_test, y_pred)
print(cm)

Training Accuracy : 1.0
Testing Accuaracy : 0.8036
[[1724 267]
[ 224 285]]

The test accuracy for Decision Tree is 80.36. This is good, buy let's see if we can do better.


Random Forest


from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print("Training Accuracy :", model.score(x_train, y_train))
print("Testing Accuracy :", model.score(x_test, y_test))

cm = confusion_matrix(y_test, y_pred)
print(cm)

Training Accuracy : 1.0
Testing Accuracy : 0.8728
[[1920 71]
[ 247 262]]

The test accuracy for Random Forest is 87.28. This is better than the previous Decision Tree Model.

Let's check out the cross validation scores as well as the mean and standard deviation of these cross validation scores.


# k fold cross validation

from sklearn.model_selection import cross_val_score

cvs = cross_val_score(estimator = model, X = x_train, y = y_train, cv = 10)
print(cvs)

[0.86533333 0.84933333 0.86266667 0.85866667 0.85733333 0.852
0.85733333 0.85333333 0.85733333 0.85466667]

print("Mean Accuracy :", cvs.mean())
print("Variance :", cvs.std())

Mean Accuracy : 0.8568
Variance : 0.004548992562461844


Logistic Regression


from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print("Training Accuracy :", model.score(x_train, y_train))
print("Testing Accuracy :", model.score(x_test, y_test))

cm = confusion_matrix(y_test, y_pred)
print(cm)

Training Accuracy : 0.8096
Testing Accuracy : 0.8092
[[1916 75]
[ 402 107]]

The test accuracy of the Logistic Regression Model is 0.8092. This is good, but not as good as the Random Forest Model.


Support Vector Machine


from sklearn.svm import SVC

model = SVC()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print("Training Accuracy :", model.score(x_train, y_train))
print("Testing Accuracy :", model.score(x_test, y_test))

cm = confusion_matrix(y_test, y_pred)
print(cm)

Training Accuracy : 0.8625333333333334
Testing Accuracy : 0.8616
[[1951 40]
[ 306 203]]

The test accuracy of the Support Vector Machine Model is 0.8616. This is good, but not as good as the Random Forest Model.

Let's check out the cross validation scores as well as the mean and standard deviation of these cross validation scores.


# k fold cross validation

from sklearn.model_selection import cross_val_score

cvs = cross_val_score(estimator = model, X = x_train, y = y_train, cv = 10)
print(cvs)

[0.864      0.852      0.864      0.85733333 0.84266667 0.844
0.852 0.85333333 0.84533333 0.85066667]

print("Mean Accuracy :", cvs.mean())
print("Variance :", cvs.std())

Mean Accuracy : 0.8525333333333333
Variance : 0.007160384843785353


Multi Layer Perceptron


from sklearn.neural_network import MLPClassifier

model = MLPClassifier(hidden_layer_sizes = (100, 100), activation ='relu',
solver = 'adam', max_iter = 50)
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print("Training Accuracy :", model.score(x_train, y_train))
print("Testing Accuracy :", model.score(x_test, y_test))

cm = confusion_matrix(y_test, y_pred)
print(cm)

Training Accuracy : 0.8896
Testing Accuracy : 0.86
[[1885 106]
[ 244 265]]

The test accuracy of the Multi Layer Perceptron Model is 0.86. This is good, but not as good as the Random Forest Model.


Artificial Neural Networks


import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.

# creating the model
model = Sequential()

# first hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 13))

# second hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# third hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# fourth hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# fifth hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the NN
# binary_crossentropy loss function used when a binary output is expected
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

model.fit(x_train, y_train, batch_size = 10, nb_epoch = 50)

Epoch 1/50
7500/7500 [==============================] - 2s 297us/step - loss: 0.4921 - acc: 0.7963
Epoch 2/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4336 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4292 - acc: 0.7963
Epoch 4/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4266 - acc: 0.7967
Epoch 5/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4229 - acc: 0.8108
Epoch 6/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4190 - acc: 0.8207
Epoch 7/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4144 - acc: 0.8271
Epoch 8/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4109 - acc: 0.8300
Epoch 9/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4076 - acc: 0.8304
Epoch 1/50
7500/7500 [==============================] - 2s 297us/step - loss: 0.4921 - acc: 0.7963
Epoch 2/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4336 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4292 - acc: 0.7963
Epoch 4/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4266 - acc: 0.7967
Epoch 5/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4229 - acc: 0.8108
Epoch 6/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4190 - acc: 0.8207
Epoch 7/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4144 - acc: 0.8271
Epoch 8/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4109 - acc: 0.8300
Epoch 9/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4076 - acc: 0.8304
Epoch 10/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.4068 - acc: 0.8320
Epoch 11/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4054 - acc: 0.8347
Epoch 12/50
7500/7500 [==============================] - 1s 157us/step - loss: 0.4053 - acc: 0.8339
Epoch 13/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4034 - acc: 0.8336
Epoch 14/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4028 - acc: 0.8351
Epoch 15/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4038 - acc: 0.8333
Epoch 16/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4024 - acc: 0.8340
Epoch 17/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4025 - acc: 0.8332
Epoch 18/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4023 - acc: 0.8361
Epoch 19/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4021 - acc: 0.8341
Epoch 20/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4020 - acc: 0.8333
Epoch 21/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4013 - acc: 0.8331
Epoch 22/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4013 - acc: 0.8363
Epoch 23/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4025 - acc: 0.8352
Epoch 24/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.4015 - acc: 0.8353
Epoch 25/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4018 - acc: 0.8356
Epoch 26/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4011 - acc: 0.8332
Epoch 27/50
7500/7500 [==============================] - 1s 153us/step - loss: 0.4016 - acc: 0.8359
Epoch 28/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.4015 - acc: 0.8345
Epoch 29/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.4011 - acc: 0.8347
Epoch 30/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4005 - acc: 0.8352
Epoch 31/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.4004 - acc: 0.8351
Epoch 32/50
7500/7500 [==============================] - 1s 157us/step - loss: 0.4016 - acc: 0.8359
Epoch 33/50
7500/7500 [==============================] - 1s 157us/step - loss: 0.4007 - acc: 0.8355
Epoch 34/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.4006 - acc: 0.8360
Epoch 35/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4008 - acc: 0.8372
Epoch 36/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4009 - acc: 0.8352
Epoch 37/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4001 - acc: 0.8367
Epoch 38/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4004 - acc: 0.8361
Epoch 39/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4008 - acc: 0.8336
Epoch 40/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.3999 - acc: 0.8348
Epoch 41/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.3997 - acc: 0.8373
Epoch 42/50
7500/7500 [==============================] - 1s 153us/step - loss: 0.3998 - acc: 0.8364
Epoch 43/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.3995 - acc: 0.8359
Epoch 44/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.3996 - acc: 0.8367
Epoch 45/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.3992 - acc: 0.8368
Epoch 46/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.3993 - acc: 0.8383
Epoch 47/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.3992 - acc: 0.8392
Epoch 48/50
7500/7500 [==============================] - 1s 154us/step - loss: 0.3993 - acc: 0.8373
Epoch 49/50
7500/7500 [==============================] - 1s 156us/step - loss: 0.3992 - acc: 0.8381
Epoch 50/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.3990 - acc: 0.8387

Let's try adding dropout layers, and see how it affects performance.


# creating the model
model = Sequential()

from keras.layers import Dropout

# first hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 13))
model.add(Dropout(0.5))

# second hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))

# output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the NN
# binary_crossentropy loss function used when a binary output is expected
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

model.fit(x_train, y_train, batch_size = 10, nb_epoch = 50)

Epoch 1/50
7500/7500 [==============================] - 1s 191us/step - loss: 0.5401 - acc: 0.7951
Epoch 2/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4751 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4703 - acc: 0.7963
Epoch 4/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4665 - acc: 0.7963
Epoch 5/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4666 - acc: 0.7963
Epoch 6/50
7500/7500 [==============================] - 1s 155us/step - loss: 0.4627 - acc: 0.7963
Epoch 7/50
7500/7500 [==============================] - 1s 185us/step - loss: 0.4663 - acc: 0.7963
Epoch 8/50
7500/7500 [==============================] - 3s 337us/step - loss: 0.4638 - acc: 0.7963
Epoch 9/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4655 - acc: 0.7963
Epoch 10/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4641 - acc: 0.7963
Epoch 11/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4654 - acc: 0.7963
Epoch 12/50
7500/7500 [==============================] - 1s 179us/step - loss: 0.4629 - acc: 0.7963
Epoch 13/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4612 - acc: 0.7963
Epoch 14/50
7500/7500 [==============================] - 1s 165us/step - loss: 0.4565 - acc: 0.7963
Epoch 15/50
7500/7500 [==============================] - 1s 167us/step - loss: 0.4615 - acc: 0.7963
Epoch 16/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4561 - acc: 0.7963
Epoch 17/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4594 - acc: 0.7963
Epoch 18/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4579 - acc: 0.7963
Epoch 19/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4581 - acc: 0.7963
Epoch 20/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4585 - acc: 0.7963
Epoch 21/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4549 - acc: 0.7963
Epoch 22/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4598 - acc: 0.7963
Epoch 23/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4601 - acc: 0.7963
Epoch 24/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4580 - acc: 0.7963
Epoch 25/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4565 - acc: 0.7963
Epoch 26/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4603 - acc: 0.7963
Epoch 27/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4554 - acc: 0.7963
Epoch 28/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4602 - acc: 0.7963
Epoch 29/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4592 - acc: 0.7963
Epoch 30/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4584 - acc: 0.7963
Epoch 31/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4587 - acc: 0.7963
Epoch 32/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4610 - acc: 0.7963
Epoch 33/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4602 - acc: 0.7963
Epoch 34/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4591 - acc: 0.7963
Epoch 35/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4565 - acc: 0.7963
Epoch 36/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4620 - acc: 0.7963
Epoch 37/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4593 - acc: 0.7963
Epoch 38/50
7500/7500 [==============================] - 1s 164us/step - loss: 0.4587 - acc: 0.7963
Epoch 39/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4587 - acc: 0.7963
Epoch 40/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4575 - acc: 0.7963
Epoch 41/50
7500/7500 [==============================] - 1s 158us/step - loss: 0.4583 - acc: 0.7963
Epoch 42/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4585 - acc: 0.7963
Epoch 43/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4628 - acc: 0.7963
Epoch 44/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4596 - acc: 0.7963
Epoch 45/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4602 - acc: 0.7963
Epoch 46/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4608 - acc: 0.7963
Epoch 47/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4586 - acc: 0.7963
Epoch 48/50
7500/7500 [==============================] - 1s 159us/step - loss: 0.4543 - acc: 0.7963
Epoch 49/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4593 - acc: 0.7963
Epoch 50/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4598 - acc: 0.7963

<keras.callbacks.History at 0x7fb79414b9b0>

 The test accuracy is reduced. Maybe we should try reducing probability of dropout.


# creating the model
model = Sequential()

# first hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 13))
model.add(Dropout(0.1))

# second hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))
model.add(Dropout(0.1))

# output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the NN
# binary_crossentropy loss function used when a binary output is expected
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

model.fit(x_train, y_train, batch_size = 10, nb_epoch = 50)

Epoch 1/50
7500/7500 [==============================] - 2s 204us/step - loss: 0.4842 - acc: 0.7951
Epoch 2/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4363 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4291 - acc: 0.7963
Epoch 4/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4273 - acc: 0.7963
Epoch 5/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4267 - acc: 0.8176
Epoch 6/50
7500/7500 [==============================] - 1s 160us/step - loss: 0.4268 - acc: 0.8215
Epoch 7/50
7500/7500 [==============================] - 1s 161us/step - loss: 0.4232 - acc: 0.8237
Epoch 8/50
7500/7500 [==============================] - 1s 164us/step - loss: 0.4229 - acc: 0.8236
Epoch 9/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4212 - acc: 0.8264
Epoch 10/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.4222 - acc: 0.8269
Epoch 11/50
7500/7500 [==============================] - 2s 212us/step - loss: 0.4227 - acc: 0.8276
Epoch 12/50
7500/7500 [==============================] - 1s 198us/step - loss: 0.4208 - acc: 0.8299
Epoch 13/50
7500/7500 [==============================] - 1s 196us/step - loss: 0.4216 - acc: 0.8297
Epoch 14/50
7500/7500 [==============================] - 2s 204us/step - loss: 0.4203 - acc: 0.8291
Epoch 15/50
7500/7500 [==============================] - 1s 198us/step - loss: 0.4180 - acc: 0.8297
Epoch 16/50
7500/7500 [==============================] - 2s 200us/step - loss: 0.4151 - acc: 0.8296
Epoch 17/50
7500/7500 [==============================] - 2s 200us/step - loss: 0.4200 - acc: 0.8308
Epoch 18/50
7500/7500 [==============================] - 1s 200us/step - loss: 0.4180 - acc: 0.8307
Epoch 19/50
7500/7500 [==============================] - 1s 199us/step - loss: 0.4206 - acc: 0.8300
Epoch 20/50
7500/7500 [==============================] - 1s 192us/step - loss: 0.4198 - acc: 0.8304
Epoch 21/50
7500/7500 [==============================] - 1s 195us/step - loss: 0.4192 - acc: 0.8313
Epoch 22/50
7500/7500 [==============================] - 1s 197us/step - loss: 0.4200 - acc: 0.8313
Epoch 23/50
7500/7500 [==============================] - 1s 196us/step - loss: 0.4242 - acc: 0.8309
Epoch 24/50
7500/7500 [==============================] - 1s 195us/step - loss: 0.4191 - acc: 0.8303
Epoch 25/50
7500/7500 [==============================] - 2s 201us/step - loss: 0.4212 - acc: 0.8292
Epoch 26/50
7500/7500 [==============================] - 1s 198us/step - loss: 0.4196 - acc: 0.8304
Epoch 27/50
7500/7500 [==============================] - 1s 200us/step - loss: 0.4189 - acc: 0.8329
Epoch 28/50
7500/7500 [==============================] - 2s 205us/step - loss: 0.4186 - acc: 0.8320
Epoch 29/50
7500/7500 [==============================] - 2s 211us/step - loss: 0.4214 - acc: 0.8309
Epoch 30/50
7500/7500 [==============================] - 2s 215us/step - loss: 0.4188 - acc: 0.8299
Epoch 31/50
7500/7500 [==============================] - 2s 303us/step - loss: 0.4194 - acc: 0.8320
Epoch 32/50
7500/7500 [==============================] - 3s 397us/step - loss: 0.4202 - acc: 0.8324
Epoch 33/50
7500/7500 [==============================] - 2s 292us/step - loss: 0.4171 - acc: 0.8329
Epoch 34/50
7500/7500 [==============================] - 1s 194us/step - loss: 0.4201 - acc: 0.8295
Epoch 35/50
7500/7500 [==============================] - 1s 198us/step - loss: 0.4166 - acc: 0.8285
Epoch 36/50
7500/7500 [==============================] - 2s 265us/step - loss: 0.4181 - acc: 0.8315
Epoch 37/50
7500/7500 [==============================] - 1s 181us/step - loss: 0.4198 - acc: 0.8304
Epoch 38/50
7500/7500 [==============================] - 1s 164us/step - loss: 0.4172 - acc: 0.8327
Epoch 39/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4188 - acc: 0.8316
Epoch 40/50
7500/7500 [==============================] - 2s 275us/step - loss: 0.4201 - acc: 0.8316
Epoch 41/50
7500/7500 [==============================] - 2s 219us/step - loss: 0.4197 - acc: 0.8337
Epoch 42/50
7500/7500 [==============================] - 1s 178us/step - loss: 0.4179 - acc: 0.8316
Epoch 43/50
7500/7500 [==============================] - 2s 216us/step - loss: 0.4189 - acc: 0.8312
Epoch 44/50
7500/7500 [==============================] - 1s 175us/step - loss: 0.4178 - acc: 0.8308
Epoch 45/50
7500/7500 [==============================] - 2s 213us/step - loss: 0.4146 - acc: 0.8309
Epoch 46/50
7500/7500 [==============================] - 1s 191us/step - loss: 0.4161 - acc: 0.8303
Epoch 47/50
7500/7500 [==============================] - 2s 224us/step - loss: 0.4177 - acc: 0.8301
Epoch 48/50
7500/7500 [==============================] - 2s 203us/step - loss: 0.4192 - acc: 0.8301
Epoch 49/50
7500/7500 [==============================] - 1s 170us/step - loss: 0.4189 - acc: 0.8307
Epoch 50/50
7500/7500 [==============================] - 1s 162us/step - loss: 0.4164 - acc: 0.8312

<keras.callbacks.History at 0x7fb77cab9748>

# creating the model
model = Sequential()

# first hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 13))

# second hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the NN
# binary_crossentropy loss function used when a binary output is expected
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

model.fit(x_train, y_train, batch_size = 10, nb_epoch = 50)

Epoch 1/50
7500/7500 [==============================] - 3s 375us/step - loss: 0.4805 - acc: 0.7964
Epoch 2/50
7500/7500 [==============================] - 3s 396us/step - loss: 0.4283 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4228 - acc: 0.8112
Epoch 4/50
7500/7500 [==============================] - 1s 166us/step - loss: 0.4174 - acc: 0.8235
Epoch 5/50
7500/7500 [==============================] - 2s 232us/step - loss: 0.4131 - acc: 0.8312
Epoch 6/50
7500/7500 [==============================] - 2s 230us/step - loss: 0.4095 - acc: 0.8319
Epoch 7/50
7500/7500 [==============================] - 1s 190us/step - loss: 0.4058 - acc: 0.8360
Epoch 8/50
7500/7500 [==============================] - 1s 189us/step - loss: 0.4051 - acc: 0.8313
Epoch 9/50
7500/7500 [==============================] - 1s 196us/step - loss: 0.4028 - acc: 0.8335
Epoch 10/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.4015 - acc: 0.8351
Epoch 1/50
7500/7500 [==============================] - 3s 375us/step - loss: 0.4805 - acc: 0.7964
Epoch 2/50
7500/7500 [==============================] - 3s 396us/step - loss: 0.4283 - acc: 0.7963
Epoch 3/50
7500/7500 [==============================] - 1s 163us/step - loss: 0.4228 - acc: 0.8112
Epoch 4/50
7500/7500 [==============================] - 1s 166us/step - loss: 0.4174 - acc: 0.8235
Epoch 5/50
7500/7500 [==============================] - 2s 232us/step - loss: 0.4131 - acc: 0.8312
Epoch 6/50
7500/7500 [==============================] - 2s 230us/step - loss: 0.4095 - acc: 0.8319
Epoch 7/50
7500/7500 [==============================] - 1s 190us/step - loss: 0.4058 - acc: 0.8360
Epoch 8/50
7500/7500 [==============================] - 1s 189us/step - loss: 0.4051 - acc: 0.8313
Epoch 9/50
7500/7500 [==============================] - 1s 196us/step - loss: 0.4028 - acc: 0.8335
Epoch 10/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.4015 - acc: 0.8351
Epoch 11/50
7500/7500 [==============================] - 1s 189us/step - loss: 0.4009 - acc: 0.8328
Epoch 12/50
7500/7500 [==============================] - 1s 197us/step - loss: 0.4001 - acc: 0.8331
Epoch 13/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3993 - acc: 0.8333
Epoch 14/50
7500/7500 [==============================] - 1s 189us/step - loss: 0.3987 - acc: 0.8348
Epoch 15/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3982 - acc: 0.8327
Epoch 16/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3977 - acc: 0.8357
Epoch 17/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3972 - acc: 0.8331
Epoch 18/50
7500/7500 [==============================] - 1s 185us/step - loss: 0.3968 - acc: 0.8327
Epoch 19/50
7500/7500 [==============================] - 1s 190us/step - loss: 0.3975 - acc: 0.8349
Epoch 20/50
7500/7500 [==============================] - 2s 201us/step - loss: 0.3966 - acc: 0.8344
Epoch 21/50
7500/7500 [==============================] - 1s 198us/step - loss: 0.3963 - acc: 0.8341
Epoch 22/50
7500/7500 [==============================] - 2s 201us/step - loss: 0.3957 - acc: 0.8340
Epoch 23/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3956 - acc: 0.8363
Epoch 24/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3958 - acc: 0.8359
Epoch 25/50
7500/7500 [==============================] - 1s 179us/step - loss: 0.3957 - acc: 0.8349
Epoch 26/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3956 - acc: 0.8349
Epoch 27/50
7500/7500 [==============================] - 1s 184us/step - loss: 0.3956 - acc: 0.8365
Epoch 28/50
7500/7500 [==============================] - 1s 191us/step - loss: 0.3952 - acc: 0.8363
Epoch 29/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3952 - acc: 0.8360
Epoch 30/50
7500/7500 [==============================] - 1s 182us/step - loss: 0.3947 - acc: 0.8363
Epoch 31/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3950 - acc: 0.8357
Epoch 32/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3948 - acc: 0.8365
Epoch 33/50
7500/7500 [==============================] - 1s 183us/step - loss: 0.3945 - acc: 0.8365
Epoch 34/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3952 - acc: 0.8348
Epoch 35/50
7500/7500 [==============================] - 1s 185us/step - loss: 0.3944 - acc: 0.8371
Epoch 36/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3943 - acc: 0.8357
Epoch 37/50
7500/7500 [==============================] - 1s 185us/step - loss: 0.3941 - acc: 0.8365
Epoch 38/50
7500/7500 [==============================] - 1s 184us/step - loss: 0.3945 - acc: 0.8357
Epoch 39/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3940 - acc: 0.8344
Epoch 40/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3944 - acc: 0.8377
Epoch 41/50
7500/7500 [==============================] - 1s 186us/step - loss: 0.3945 - acc: 0.8377
Epoch 42/50
7500/7500 [==============================] - 1s 190us/step - loss: 0.3939 - acc: 0.8341
Epoch 43/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3943 - acc: 0.8375
Epoch 44/50
7500/7500 [==============================] - 1s 186us/step - loss: 0.3938 - acc: 0.8371
Epoch 45/50
7500/7500 [==============================] - 1s 186us/step - loss: 0.3944 - acc: 0.8361
Epoch 46/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3938 - acc: 0.8347
Epoch 47/50
7500/7500 [==============================] - 1s 188us/step - loss: 0.3937 - acc: 0.8363
Epoch 48/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3938 - acc: 0.8365
Epoch 49/50
7500/7500 [==============================] - 1s 187us/step - loss: 0.3934 - acc: 0.8367
Epoch 50/50
7500/7500 [==============================] - 1s 190us/step - loss: 0.3934 - acc: 0.8360

<keras.callbacks.History at 0x7fb77ca1a358>

The best test accuracy of the Deep Learning Model with 4 hidden layers and no dropout layer is 0.8640. This is good, but not as good as the Random Forest Model.


data.columns

Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
'Exited'],
dtype='object')

We want to feed a sample data point to our classifier model, and see what prediction it gives us


'''
predicting if the costumer having following information will leave the bank or not ?

Geography : france
Age = 50
Credit score = 850
Tenure = 4
Balance = 150000
Number of Products = 5
Gender = Female
Has Credit Card = yes
Is Active Member = yes
Estimated Salary = 85000
'''

new_prediction = model.predict(sc.transform(np.array([[850, 50, 4, 150000, 5, 1, 1, 85000, 1, 0, 0, 1, 0]])))

new_prediction = (new_prediction > 0.5 )
print(new_prediction)

[[False]]

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score

from keras.layers import Dense
from keras.models import Sequential

def build_classifier():
# creating the model
model = Sequential()

# first hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 13))

# second hidden layer
model.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the NN
# binary_crossentropy loss function used when a binary output is expected
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

return model

model = KerasClassifier(build_fn = build_classifier, batch_size = 10, nb_epoch = 50)
accuracies = cross_val_score(estimator = model, X = x_train, y = y_train, cv = 10, n_jobs = -1)

print("Accuracies :", accuracies)

print("Mean Accuracy :", accuracies.mean())
print("Variance :", accuracies.std())

Accuracies : [0.78533333 0.78933333 0.80133333 0.78933333 0.832      0.79333333
0.776 0.80266666 0.79733333 0.80533333]
Mean Accuracy : 0.7971999967892964
Variance : 0.014328836317248662

 Random Forest Model is better than Deep Learning Classifier model for this case.


Conclusions

1. We were able to do Exploratory Data Analysis and inspect for missing values, wrong values. outliers, class imbalances etc. Everything seemed to be in order.

2. We have tried multiple machine learning models on our data including Artificial Neural Networks, but it looks like the best model was the Random Forest Classifier Model, which was able to predict the customers who will churn with an accuracy of 87.28%.


0 comments