Hotel Booking Cancellation Prediction

Hashwanth Gogineni

Related Listings

Employee Attrition Pr...

0 comments, 2 reviews , 2 likes
Face Mask Detection

0 comments, 1 review , 2 likes

Lyme Disease Detection

0 comments, 1 review , 656 views, 1 like
Census Income Prediction

0 comments, 1 review , 483 views, 1 like

Major Concepts

Models Home » Generic Models » Predictive Modelling » Hotel Booking Cancellation Prediction

Hotel Booking Cancellation Prediction

Models Status

Model Overview

Hotel Booking:

The hotel industry is one of the faster-growing businesses of the tourism sector, especially with the rise of giant OTA that makes booking a hotel as easy as it has ever been. According to Portugal's National Institute of Statistics, in 2017, hotel revenue rose approximately 18% to $3.6 billion. The hotel industry's growth could also be seen in Portugal's total number of hotel guests, doubling its population in 2017.

Total of hotel guests in 2017: '20.6 Million'

Total Portugal Population in 2017: '10.31 Million'

According to 'Deloitte Hospitality Atlas 2019', Lisbon is nominated as the most attractive European city for hotel investment.

However, the growing trend of the 'hotel industry' comes with problems too; one of the problems is the rising rate of cancellations in the hotel industry. The cancellation rate rose from under 33% in 2014 to 40% in 2018.

Project Implementation:

Travel companies and hotels can use the project to retain their customers and scale their businesses. The project also provides deep customer data analysis, which can be useful to understand customer behaviour for organizations in the Hotel industry.

Dataset:

The data is originally from the article 'Hotel Booking Demand Datasets,' written by 'Nuno Antonio,' 'Ana Almeida,' and 'Luis Nunes' for Data in Brief, Volume 22, February 2019.

The data was downloaded and cleaned by 'Thomas Mock' and 'Antoine Bichat' for #TidyTuesday during the week of February 11th, 2020.

This data contains booking information for a 'city hotel' and a 'resort hotel.' In addition, it contains information such as when the booking was made, length of stay, the number of adults, children, and babies, and the number of available parking spaces, among other things.

Random Forest:

'Random forest' is a supervised ensemble learning algorithm used for both classifications and regression problems. However, it is mainly used for 'classification' problems as we know that a forest is made up of trees and more trees mean a more robust forest. Similarly, the random forest algorithm creates decision trees on data samples and then gets the prediction from each and finally selects the best solution utilizing voting. It is an ensemble method that is better than a single decision tree because it reduces over-fitting by averaging the result.

The fundamental concept behind random forest is a simple but powerful one — the wisdom of crowds.

Understanding Code:

First, let us import the required libraries for the project.

import pandas as pd

import numpy as np 

import matplotlib.pyplot as plt

import seaborn as sns

import folium

import plotly.express as px

from sklearn.preprocessing import LabelEncoder

from sklearn import preprocessing

from sklearn.ensemble import RandomForestClassifier

import pickle

import joblib

from sklearn.metrics import classification_report

And now load the data into the system.

df= pd.read_csv('hotel_bookings.csv')

Also, let us have a look at important visualizations of our data.

fig = plt.figure(figsize=(10,5))

sns.countplot(data=df, x = 'arrival_date_month')

plt.xlabel('Month', fontsize=15)

plt.xticks(rotation=45,fontsize=11);

basemap = folium.Map()

guests_map = px.choropleth(country_wise_guests, locations = country_wise_guests['country'],

                           color = country_wise_guests['No of guests'], hover_name = country_wise_guests['country'])

guests_map.show()

As you can see I used the 'folium.Map()' function to generate a world map.

Coming to the 'Data Preprocessing' part, let us search for missing values in the data.

df.isnull().sum()

df['agent'] = df['agent'].fillna(0)

df['children'] = df['children'].fillna(0)

df['country'] = df['country'].fillna('PRT')

df = df.drop('company', axis = 1)

As you can see, missing values exist in our data. So let us replace the Null values with 'mode' and drop the 'company' column as many missing values exist.

Let us generate a heatmap and see the importance of features in our data.

corr = df.corr()



fig,axes = plt.subplots(1,1,figsize=(20,10))

sns.heatmap(corr, annot= True)

plt.show()

I used the 'sns.heatmap' function to generate a heatmap of our data.
As you can see few features are not important for our model to predict the output, so let us eliminate those features from our data using the 'drop' function.

# dropping columns that are not useful



useless_col = ['arrival_date_year', 'assigned_room_type', 'reservation_status', 'country', 'arrival_date_month']



df.drop(useless_col, axis = 1, inplace = True)

Before we hop into encoding, let us convert the 'reservation_status_date' feature into 'year', 'month' and 'day' features using the 'dt' function.

df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])

df['year'] = df['reservation_status_date'].dt.year

df['month'] = df['reservation_status_date'].dt.month

df['day'] = df['reservation_status_date'].dt.day



df.drop(['reservation_status_date'] , axis = 1, inplace = True)

Now, let us encode the data into numeric data using the 'Label encoder' function.

from sklearn import preprocessing



categorical = ['hotel', 'lead_time', 'arrival_date_week_number',

       'arrival_date_day_of_month', 'stays_in_weekend_nights',

       'stays_in_week_nights', 'adults', 'children', 'babies', 'meal',

       'market_segment', 'distribution_channel', 'is_repeated_guest',

       'previous_cancellations', 'previous_bookings_not_canceled',

       'reserved_room_type', 'booking_changes', 'deposit_type', 'agent',

       'days_in_waiting_list', 'customer_type', 'adr',

       'required_car_parking_spaces', 'total_of_special_requests',

       'is_canceled', 'year', 'month', 'day']

for feature in categorical:

        le = preprocessing.LabelEncoder()

        df[feature] = le.fit_transform(df[feature])

Before we feed our dataset into the model, we scale our data using the 'StandardScaler' function.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X = scaler.fit_transform(X)

X_val = scaler.transform(X_val)

Finally, I used the 'Random Forest' algorithm to solve the use case.

from sklearn.ensemble import RandomForestClassifier



model_rf = RandomForestClassifier(n_estimators=400)



model_rf.fit(X, Y)



Y_Pred = model_rf.predict(X_val)

As you can see I used the 'RandomForestClassifier' function to use the 'Random Forest' algorithm on our data.

Let us create a classification report of our model for a better understanding of the model's results.

As you can see, the model performed really well on our data.

Finally, let us save the model we trained.

pickle.dump(model_rf,open('model.pkl','wb'))

Thank you for your time.

0 comments

Prasad Chaskar likes this

Related Listings

Hashwanth Gogineni's other Models Reports

Major Concepts

Hotel Booking Cancellation Prediction

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Hashwanth Gogineni's other Models Reports

Major Concepts

Hotel Booking Cancellation Prediction

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us