A video game is a computer, gaming console, or smartphone game. Depending on the platform, video games are separated into two categories: computer games and console games. However, in recent years, the development of social networks, cellphones, and tablets has resulted in the emergence of new categories such as mobile and social games. Video games have come a long way since the first ones were introduced in the 1970s. Photorealistic graphics and realistic approximations of reality are common in today's video games.
Video games have been around for a long time and are a multibillion-dollar industry. In 2020, the worldwide PC gaming market is predicted to create around 37 billion dollars in sales, while the mobile gaming market will generate more than 77 billion dollars. The fact that the first generation of gamers has reached maturity and has significant purchasing power is what matters now. Although youngsters spend a large amount of time playing games daily, the activity is no longer solely a child's pleasure. In actuality, it has been discovered that video gaming is gaining popularity among parents all across the world, with a fairly even gender distribution of video gaming parents.
The project can help Video Game Companies analyze the market trend and produce video games accordingly.
The dataset includes a list of video games that have sold over 100,000 copies. A scrape of vgchartz.com was used to create it.
Fields include:
The 'decision tree' is the most powerful and extensively used categorization and prediction approach.
Each internal node represents a test on an attribute, each branch represents the test's outcome, and each leaf node (terminal node) holds a class label.
The following are some of the advantages of decision tree methods:
First, let us import the necessary libraries for the project.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics
import pickle
import joblib
Now, let us load the required data into the system.
df = pd.read_csv('data.csv')
Before we start preprocessing our data let us explore the data using a few visualizations.
plt.figure(figsize=(15, 10))
sns.barplot(x="Genre", y="Global_Sales", data=df)
y = df.groupby('Year')['Global_Sales'].sum()
plt.figure(figsize=(15,10))
plt.bar(y.index,y)
plt.xlabel('Year')
plt.ylabel('Global Sales')
plt.show
Coming to the 'Data Preprocessing' part, let us search for missing values in the data.
df.isnull().sum()
As you can see missing values exist in our data. So let us eliminate the rows which include missing values
df = df.dropna(axis=0, subset=['Year','Publisher', 'Global_Sales'])
Now let us encode the categorical values to feed the data into the model.
Platform_encoder=LabelEncoder()
df['Platform'] = Platform_encoder.fit_transform(df['Platform'])
pickle.dump(Platform_encoder, open('Platform_encoder.pkl','wb'))
Genre_encoder=LabelEncoder()
df['Genre'] = Genre_encoder.fit_transform(df['Genre'])
pickle.dump(Genre_encoder, open('Genre_encoder.pkl','wb'))
Publisher_encoder=LabelEncoder()
df['Publisher'] = Publisher_encoder.fit_transform(df['Publisher'])
pickle.dump(Publisher_encoder, open('Publisher_encoder.pkl','wb'))
As you can see, I used the 'LabelEncoder' function to encode our data.
Let us split the data using the "train_test_split" function into training and testing sets.
X = df.drop(columns=['Global_Sales', 'Name'])
Y = df['Global_Sales']
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
Finally, we need to scale our data before feeding our data into a model.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X.columns)
X_test = pd.DataFrame(scaler.transform(X_test), columns = X.columns)
pickle.dump(scaler, open('scaler.pkl','wb'))
As you can see, I used the "MinMaxScaler" function to scale the data.
Now, let us dive deep into the modelling part of the project.
from sklearn.tree import DecisionTreeRegressor
dt_model = DecisionTreeRegressor()
dt_model.fit(X_train, y_train)
y_pred = dt_model.predict(X_test)
dt_model.score(X_train, y_train)
I used the "Decision Tree" model to solve the problem.
As you can see, I used the "DecisionTreeRegressor" function to use the "Decision Tree" algorithm.
Now let us have a look at the model's performance report.
print('r2 score', metrics.r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
As you can see the model performed well on the data.
Thank you for your time.