Hashwanth Gogineni's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Manufacturing and Logistics » Production Quality Prediction

Production Quality Prediction

Models Status

Model Overview

Importance of Product Quality


Product quality is critical since it impacts the company's performance and helps develop its reputation in the marketplace. When businesses can consistently provide high-quality products that match consumer standards, they may reduce production costs, boost investment returns, and increase revenue.


Customers who rely on a company's attention to detail and customer demand value product quality. Companies manufacture items to suit market demand, and customers expect products to meet that need as advertised by the firm. In addition, they want things that assist them in forming a bond with a brand so they can trust what the firm has to offer. As a result, customers may handle their difficulties safely and effectively using high-quality items.




Roasting machine


The roasting machine is an aggregate consisting of 5 chambers of equal size, and each chamber has three temperature sensors. In addition, for this task, you have collected data on the height of the raw material layer and its moisture content. Layer height and humidity are measured when raw materials enter the machine. Raw materials pass through the kiln in an hour.




Why production quality prediction?


The project will help measure product quality when let into a roasting machine. The use-case will help factories detect good quality raw materials and discard the low-quality ones to produce a quality product.



Dataset


The dataset includes 'Layer Height,' 'Humidity', and data acquired by the sensors in the roasting machine.




XGBoost 


The XGBoost algorithm was created as part of a University of Washington research effort. 


In 2016, Tianqi Chen and Carlos Guestrin presented their article at the SIGKDD Conference, which ignited the Machine Learning industry. 


Since its inception, this algorithm has been credited with winning a slew of Kaggle contests and serving as the brains behind several cutting-edge industrial applications. 


Consequently, the XGBoost open source projects have a robust community of data scientists contributing to them, with 350 contributors and 3,600 contributions on GitHub. 


The following are some of how the algorithm distinguishes itself:



  1. A wide range of applications: Can be used to solve regression, classification, ranking, and user-defined prediction problems.

  2. Portability: Runs smoothly on Windows, Linux, and OS X.

  3. Languages: Supports all major programming languages, including C++, Python, R, Java, Scala, and Julia.

  4. Cloud Integration: Supports AWS, Azure, and Yarn clusters and works well with Flink, Spark, and other ecosystems.




Understanding Code
First, let us import the necessary libraries for the project.


import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
import joblib
import pickle


Now, let us load the required data into the system.


df = pd.read_csv('data_X.csv', sep=',')


Before we start preprocessing our data let us explore the data using a few visualizations.


plt.figure(figsize=(12,5))
sns.distplot(train_df['quality'] , fit=norm);

# Get the fitted parameters used by the function
(mu, sigma) = norm.fit(train_df['quality'])

#Now plot the distribution
plt.legend(['Normal dist. ($\mu=$ {:.2f} and $\sigma=$ {:.2f} )'.format(mu, sigma)],
loc='best')
plt.ylabel('Frequency')
plt.title('Quality distribution');


sns.pairplot(data=train_df);



Coming to the 'Data Preprocessing' part, let us search for missing values in the data.


df.isnull().sum()



As you can see, no missing values exist in our data.


Y = train_df['quality']
train_df.drop(['quality', 'date_time'], axis=1, inplace=True)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train_df, Y, test_size = 0.3)

As you can see, I used the "train_test_split" function to split our dataframe into training and testing sets. Also, I dropped the 'date_time' feature as it is not necessary for our model. 

Finally, we need to scale our data before feeding our data into a model.


from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = train_df.columns)

X_test = pd.DataFrame(scaler.transform(X_test), columns = train_df.columns)

I used the "MinMaxScaler" function to scale the data.

Now, let us dive into the modelling part of the project.


from xgboost import XGBRegressor


xbg_model= XGBRegressor()
xbg_model.fit(X_train, y_train)
y_pred = xbg_model.predict(X_test)
xbg_model.score(X_train, y_train)*100

As you can see, I used the "XGBoost" algorithm to get the most accurate predictions.
I used the "XGBRegressor" function to apply the "XGBoost" algorithm to our data.

Finally, let us check the model's performance using a few metrics. 


print('r2 score', r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))


As you can see, the model performed really well on the data.





Thank you for your time.




 


0 comments