About Dataset
Context
Abalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.
Data Description
Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.
Given below is attribute name, type, measurement, and brief description.
Name Data Type Meas. Description
----- --------- ----- -----------
Sex nominal M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years
Dataset Link: https://www.kaggle.com/hurshd0/abalone-uci
Code:
Libraries:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.linear_model import LinearRegression
from xgboost import XGBRFRegressor
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('abalone_original.csv')
df.head()
Check for NULL Values:
df.isnull().sum()
Create feature age which is our target variable and formula for the same is as follow:
Age = rings + 1.5 gives the age in years
df['age'] = df.rings.apply(lambda x: x+1.5)
EDA:
Sex Feature:
M-Mal, F-Female, I-Infant
sns.countplot(x='sex',data=df);
plt.title("Distribution of Sex Featue",{'fontsize':25});
Other Feature:
fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df.height,y=df.rings,color='b');
axes[0].set_title("Height Vs Rings")
sns.scatterplot(ax=axes[1],x=df.length,y=df.rings,color='b');
axes[1].set_title("Length Vs Rings");
fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df['whole-weight'],y=df.rings,color='b');
axes[0].set_title("Ehole Weight Vs Rings")
sns.scatterplot(ax=axes[1],x=df['shucked-weight'],y=df.rings,color='b');
axes[1].set_title("Shucked weight Vs Rings");
fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df['viscera-weight'],y=df.rings,color='b');
axes[0].set_title("viscera Weight Vs Rings")
sns.scatterplot(ax=axes[1],x=df['shell-weight'],y=df.rings,color='b');
axes[1].set_title("Shell-weight Vs Rings");
Split Data:
X = df.drop(['age','rings'],axis=1)
y = df.age
Convert Categorical Variable int Numeric:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
X.sex = encoder.fit_transform(X.sex)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)
Model Building:
Rando Forest Regressor:
model1 = RandomForestRegressor()
model1.fit(X_train,y_train)
y_pred1 = model1.predict(X_test)
print(mean_squared_error(y_test, y_pred1) ** 0.5)
Output: 2.1891808129021566
Linear Regression:
model2 = LinearRegression()
model2.fit(X_train,y_train)
y_pred2 = model2.predict(X_test)
print(mean_squared_error(y_test, y_pred2) ** 0.5)
Output: 2.226173694007794
model3 = XGBRFRegressor()
model3.fit(X_train,y_train)
y_pred3 = model3.predict(X_test)
print(mean_squared_error(y_test, y_pred3) ** 0.5)
Output: 2.2031045005229757
Feature Importance:
plt.figure(figsize=(9,7))
feature_imp1 = model1.feature_importances_
sns.barplot(x=feature_imp1, y=X.columns)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important Features For Random Forest Regressor",{'fontsize':25})
plt.show();
Thank You ):