Hashwanth Gogineni's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Generic Models » Predictive Modelling » Employee Attrition Prediction

Employee Attrition Prediction

Models Status

Model Overview

Employee Attrition


Employee attrition happens when your workforce shrinks over time due to unavoidable variables like employee resignation for personal or professional reasons. Employees are departing the workforce at a faster rate than they are being hired, and this is often beyond the control of the company. Let's imagine you've recently opened a new office that will serve as your company's sales hub. Every salesman is required to work out of this location; nevertheless, a few individuals are unable to relocate and must leave the organisation. This is a common reason for employee departure. Other factors that contribute to attrition include a lack of professional development, a hostile work environment, or a loss of faith in the company's market worth. Another element that contributes to employee attrition is ineffective leadership.



Why Employee Attrition Prediction?


The project can help companies get to know about employee attrition conditions and take necessary actions to avoid them.


Dataset


The dataset includes '24' features i.e 'Age', 'BusinessTravel', 'Department', 'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount', 'EmployeeID', 'Gender', 'JobLevel', 'JobRole', 'MaritalStatus', 'MonthlyIncome', 'NumCompaniesWorked', 'Over18', 'PercentSalaryHike', 'StandardHours', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'YearsAtCompany', 'YearsSinceLastPromotion', 'YearsWithCurrManager', 'Attrition'. The dataset has a total of '1412' instances.




XGBoost


XGBoost the Method is a gradient boosting algorithm, which is a common technique in ensemble learning, as the name suggests. To clarify that new term, ensemble learning is a sort of machine learning that combines the predictions of multiple models. Boosting algorithms differ from other ensemble learning techniques in that they transform a series of weak models into increasingly powerful models. Gradient boosting algorithms use the gradient of a loss function that captures a model's performance to decide how to build a more powerful model. Many machine learning techniques use gradient boosting as a foundation. With its employment in many competition-winning models and numerous research references, XGBoost has made a reputation for itself in the boosting game.





Understanding Code


First, let us import the required libraries for the project.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import pickle
from sklearn.model_selection import train_test_split
import joblib
from sklearn.metrics import classification_report


And now load the data into the system.


df = pd.read_csv("data.csv")


Also, let us look at a few important visualizations of our data.


# Pie-chart
labels = 'Yes', 'No'
sizes = [706, 705]
colors = ['lightblue','Pink']
fig1, ax1 = plt.subplots(figsize =(7.5,7.5))
ax1.pie(sizes,colors = colors , labels=labels, shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Attrition")
plt.show()




plt.figure(figsize=(15, 10))
sns.barplot(x="Department", y="Attrition", data=df)




Coming to the 'Data Preprocessing' part, let us search for missing values in the data.


df.isnull().sum()



As you can see missing values exist in our data. So let us eliminate the rows which include missing values


df = df.dropna(axis=0, subset=['NumCompaniesWorked', 'TotalWorkingYears'])

Now let us encode the categorical values to feed the data into the model.


BusinessTravel_encoder=LabelEncoder()
df['BusinessTravel'] = BusinessTravel_encoder.fit_transform(df['BusinessTravel'])
pickle.dump(BusinessTravel_encoder, open('BusinessTravel_encoder.pkl','wb'))

Department_encoder=LabelEncoder()
df['Department'] = Department_encoder.fit_transform(df['Department'])
pickle.dump(Department_encoder, open('Department_encoder.pkl','wb'))

EducationField_encoder=LabelEncoder()
df['EducationField'] = EducationField_encoder.fit_transform(df['EducationField'])
pickle.dump(EducationField_encoder, open('EducationField_encoder.pkl','wb'))

Gender_encoder=LabelEncoder()
df['Gender'] = Gender_encoder.fit_transform(df['Gender'])
pickle.dump(Gender_encoder, open('Gender_encoder.pkl','wb'))

JobRole_encoder=LabelEncoder()
df['JobRole'] = JobRole_encoder.fit_transform(df['JobRole'])
pickle.dump(JobRole_encoder, open('JobRole_encoder.pkl','wb'))

MaritalStatus_encoder=LabelEncoder()
df['MaritalStatus'] = MaritalStatus_encoder.fit_transform(df['MaritalStatus'])
pickle.dump(MaritalStatus_encoder, open('MaritalStatus_encoder.pkl','wb'))

Over18_encoder=LabelEncoder()
df['Over18'] = Over18_encoder.fit_transform(df['Over18'])
pickle.dump(Over18_encoder, open('Over18_encoder.pkl','wb'))

df['Attrition'] = df['Attrition'].replace({'No': 0, 'Yes': 1,})

As you can see, I used the 'LabelEncoder' function to encode our data.

Let us split the data into training and testing sets using the "train_test_split" function.


X = df.drop(columns=['Attrition'])
Y = df['Attrition']

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)


Finally, we need to scale our data before feeding our data into a model.


scaler = MinMaxScaler()

X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X.columns)

X_test = pd.DataFrame(scaler.transform(X_test), columns = X.columns)

pickle.dump(scaler, open('scaler.pkl','wb'))

As you can see, I used the "MinMaxScaler" function to scale the data.

Now, let us dive deep into the modelling part of the project.


from xgboost import XGBClassifier


xgb_model= XGBClassifier()
xgb_model.fit(X_train, y_train)
y_pred = xgb_model.predict(X_test)
xgb_model.score(X_train, y_train)*100

I used the "XGBoost" model to solve the problem.
As you can see, I used the "XGBClassifier" function to use the "XGBoost" algorithm.

Now let us have a look at the model's performance report.


from sklearn.metrics import classification_report
class_names = ['Employees who did not leave the Company', 'Employees Who Left the Company']
print(classification_report(y_test, y_pred, target_names=class_names))


As you can see, our model performed well.



Thank you for your time.


0 comments