Note: If the author has requested for "Expert Guidance" and you can help, please start a New Topic in the "Discussions" Tab

Hashwanth Gogineni's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Retail » Video Game Sales Prediction

Video Game Sales Prediction

Models Status

Model Overview

Video game industry

A video game is a computer, gaming console, or smartphone game. Depending on the platform, video games are separated into two categories: computer games and console games. However, in recent years, the development of social networks, cellphones, and tablets has resulted in the emergence of new categories such as mobile and social games. Video games have come a long way since the first ones were introduced in the 1970s. Photorealistic graphics and realistic approximations of reality are common in today's video games.

Video games have been around for a long time and are a multibillion-dollar industry. In 2020, the worldwide PC gaming market is predicted to create around 37 billion dollars in sales, while the mobile gaming market will generate more than 77 billion dollars. The fact that the first generation of gamers has reached maturity and has significant purchasing power is what matters now. Although youngsters spend a large amount of time playing games daily, the activity is no longer solely a child's pleasure. In actuality, it has been discovered that video gaming is gaining popularity among parents all across the world, with a fairly even gender distribution of video gaming parents.

Why Video Game Sales Prediction?

The project can help Video Game Companies analyze the market trend and produce video games accordingly.


The dataset includes a list of video games that have sold over 100,000 copies. A scrape of was used to create it.

Fields include:

  • Name - The video game's name

  • Platform - The platform of the game's release (i.e. Xbox, PS4, etc.)

  • Year - The Year of the game's release

  • Genre - Genre of the game

  • Publisher - Publisher of the game

  • NA_Sales - Sales in 'North America' (in millions)

  • Global_Sales - Total worldwide sales.

Decision Tree

The 'decision tree' is the most powerful and extensively used categorization and prediction approach.

Each internal node represents a test on an attribute, each branch represents the test's outcome, and each leaf node (terminal node) holds a class label.

The following are some of the advantages of decision tree methods:

  • Easy to understand rules can be found in the decision trees method.

  • Decision trees are used for classification, and they don't require a lot of computer power.

  • Decision trees algorithm can handle both continuous and categorical variables.

  • The most important fields for classification or prediction are shown in decision trees.

Understanding Code

First, let us import the necessary libraries for the project.

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics
import pickle
import joblib

Now, let us load the required data into the system.

df = pd.read_csv('data.csv')

Before we start preprocessing our data let us explore the data using a few visualizations.

plt.figure(figsize=(15, 10))
sns.barplot(x="Genre", y="Global_Sales", data=df)

y = df.groupby('Year')['Global_Sales'].sum()
plt.ylabel('Global Sales')

Coming to the 'Data Preprocessing' part, let us search for missing values in the data.


As you can see missing values exist in our data. So let us eliminate the rows which include missing values

df = df.dropna(axis=0, subset=['Year','Publisher', 'Global_Sales'])

Now let us encode the categorical values to feed the data into the model.

df['Platform'] = Platform_encoder.fit_transform(df['Platform'])
pickle.dump(Platform_encoder, open('Platform_encoder.pkl','wb'))

df['Genre'] = Genre_encoder.fit_transform(df['Genre'])
pickle.dump(Genre_encoder, open('Genre_encoder.pkl','wb'))

df['Publisher'] = Publisher_encoder.fit_transform(df['Publisher'])
pickle.dump(Publisher_encoder, open('Publisher_encoder.pkl','wb'))

As you can see, I used the 'LabelEncoder' function to encode our data.

Let us split the data using the "train_test_split" function into training and testing sets.

X = df.drop(columns=['Global_Sales', 'Name'])
Y = df['Global_Sales']

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

Finally, we need to scale our data before feeding our data into a model.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X.columns)

X_test = pd.DataFrame(scaler.transform(X_test), columns = X.columns)

pickle.dump(scaler, open('scaler.pkl','wb'))

As you can see, I used the "MinMaxScaler" function to scale the data.

Now, let us dive deep into the modelling part of the project.

from sklearn.tree import DecisionTreeRegressor

dt_model = DecisionTreeRegressor(), y_train)
y_pred = dt_model.predict(X_test)
dt_model.score(X_train, y_train)

I used the "Decision Tree" model to solve the problem.
As you can see, I used the "DecisionTreeRegressor" function to use the "Decision Tree" algorithm.

Now let us have a look at the model's performance report.

print('r2 score', metrics.r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

As you can see the model performed well on the data.

Thank you for your time.