Term Deposits, also known as Specified Deposits, are an investment vehicle in which a lump-sum sum amount is placed for a fixed length of time, ranging from one month to five years, at an agreed rate of interest.
Banks, non-banking financial companies (NBFCs), credit unions, post offices, and building societies are all places where you may get a term deposit.
Term Deposits have certain monetary characteristics that have made them popular among investors.
Term deposits have the following important characteristics:
The project can help Banks and financial institutions help find valuable investors who are interested in buying term deposits.
The classic marketing bank dataset was uploaded originally in the UCI Machine Learning Repository. The dataset contains information on a financial institution's marketing campaign, which you must examine to discover methods to enhance future marketing efforts for the bank.
Yandex's CatBoost machine learning algorithm was just open-sourced.
It's simple to interface with deep learning frameworks like TensorFlow from Google and Core ML from Apple.
It can work with a variety of data formats to assist organizations in addressing various challenges.
To top it off, it has the highest accuracy in the industry.
It is particularly strong in two ways:
1) it produces cutting-edge results without the considerable data training that other machine learning approaches require, and
2) it provides excellent out-of-the-box support for the more descriptive data formats that often accompany business challenges.
The term "CatBoost" is derived from the phrases "Category" and "Boosting."
As previously stated, the library works well with various data types, including audio, text, picture, and historical data.
Because this library is based on the gradient boosting library, the name "Boost" derives from the gradient boosting machine learning technique.
Gradient boosting is a sophisticated machine learning approach that has been successfully used for various commercial difficulties such as fraud detection, recommendation items, and forecasting.
It may also produce excellent results with a small quantity of data instead of deep learning models, which require a large amount of data to learn from.
Understanding Code
First, let us import the required libraries for the project.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
import joblib
import pickle
And now load the data into the system.
df=pd.read_csv("data.csv")
Also, let us look at a few important visualizations of our data.
# Pie-chart
labels = 'Yes', 'No'
sizes = [5868, 5284]
colors = ['black','greenyellow']
fig1, ax1 = plt.subplots(figsize =(7.5,7.5))
ax1.pie(sizes,colors = colors , labels=labels, shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Deposit")
plt.show()
# Pie-chart
labels = 'married', 'single' , 'divorced'
sizes = [6346, 3513, 1293]
colors = ['violet', 'lightgrey','black']
fig1, ax1 = plt.subplots(figsize =(7.5,7.5))
ax1.pie(sizes,colors = colors , labels=labels, shadow=True, startangle=90)
ax1.axis('equal')
plt.title("marital")
plt.show()
Coming to the 'Data Preprocessing' part, let us search for missing values in the data.
df.isnull().sum()
As you can see, no missing values exist in our data.
Now let us encode the categorical values to feed the data into the model.
from sklearn import preprocessing
categorical = ['job', 'marital', 'education', 'default', 'housing',
'loan', 'contact', 'month', 'poutcome']
for feature in categorical:
le = preprocessing.LabelEncoder()
df[feature] = le.fit_transform(df[feature])
df['deposit']=df['deposit'].map({'yes':1,'no':0})
As you can see, I used the 'LabelEncoder' function to encode our data.
Let us split the data into training and testing sets using the "train_test_split" function.
Y = df['deposit']
X = df.drop('deposit', axis = 1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2)
Finally, we need to scale our data before feeding our data into a model.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X.columns)
X_test = pd.DataFrame(scaler.transform(X_test), columns = X.columns)
As you can see, I used the "StandardScaler" function to scale the data.
Now, let us dive deep into the modelling part of the project.
from catboost import CatBoostClassifier
cb_model= CatBoostClassifier()
cb_model.fit(X_train, y_train)
y_pred = cb_model.predict(X_test)
cb_model.score(X_train, y_train)*100
I used the "Catboost" model to solve the problem.
As you can see, I used the "CatBoostClassifier" function to use the "Catboost" algorithm.
Now let us have a look at the model's performance report.
from sklearn.metrics import classification_report
class_names = ['Customers who did not buy the Term Deposit', 'Customers who Bought the Term Deposit']
print(classification_report(y_test, y_pred, target_names=class_names))
As you can see, the model did well and is now production-ready.
Thank you for your time.