Prasad Chaskar's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Agriculture » Predicting the age of Abalone

Predicting the age of Abalone

Models Status

Model Overview



About Dataset


Context


Abalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.


Data Description



  • From the original data examples with missing values were removed (the majority having the predicted value missing),
    and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

  • Number of instances:4177

  • Number of attributes:8

  • Features:Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight,
    and Shell weight

  • Target:Rings


Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.



  • Attribute Information:


Given below is attribute name, type, measurement, and brief description.


Name                Data Type                Meas.  Description


-----                ---------   -----         -----------


Sex                    nominal                       M, F, and I (infant)


Length                continuous  mm           Longest shell measurement


Diameter            continuous  mm           perpendicular to length


Height                continuous  mm           with meat in shell


Whole weight        continuous  grams      whole abalone


Shucked weight        continuous  grams   weight of meat


Viscera weight        continuous  grams     gut weight (after bleeding)


Shell weight        continuous  grams       after being dried


Rings                integer                       +1.5 gives the age in years




Dataset Link: https://www.kaggle.com/hurshd0/abalone-uci

C
ode:

Libraries:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.linear_model import LinearRegression
from xgboost import XGBRFRegressor
import matplotlib.pyplot as plt
import seaborn as sns

Load Dataset:


df = pd.read_csv('abalone_original.csv')
df.head()​


Check for NULL Values: 


df.isnull().sum()

Missing Values: None




Create feature age which is our target variable and formula for the same is as follow:


Age = rings + 1.5 gives the age in years




df['age'] = df.rings.apply(lambda x: x+1.5)​


EDA:
Sex Feature:
M-Mal, F-Female, I-Infant


sns.countplot(x='sex',data=df);
plt.title("Distribution of Sex Featue",{'fontsize':25});



Other Feature:


fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df.height,y=df.rings,color='b');
axes[0].set_title("Height Vs Rings")
sns.scatterplot(ax=axes[1],x=df.length,y=df.rings,color='b');
axes[1].set_title("Length Vs Rings");




fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df['whole-weight'],y=df.rings,color='b');
axes[0].set_title("Ehole Weight Vs Rings")
sns.scatterplot(ax=axes[1],x=df['shucked-weight'],y=df.rings,color='b');
axes[1].set_title("Shucked weight Vs Rings");




fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes[0],x=df['viscera-weight'],y=df.rings,color='b');
axes[0].set_title("viscera Weight Vs Rings")
sns.scatterplot(ax=axes[1],x=df['shell-weight'],y=df.rings,color='b');
axes[1].set_title("Shell-weight Vs Rings");


Split Data: 


X = df.drop(['age','rings'],axis=1)
y = df.age

Convert Categorical Variable int Numeric: 


from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
X.sex = encoder.fit_transform(X.sex)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)

Model Building:
Rando Forest Regressor:


model1 = RandomForestRegressor()
model1.fit(X_train,y_train)
y_pred1 = model1.predict(X_test)
print(mean_squared_error(y_test, y_pred1) ** 0.5)

Output: 2.1891808129021566

Linear Regression: 


model2 = LinearRegression()
model2.fit(X_train,y_train)
y_pred2 = model2.predict(X_test)
print(mean_squared_error(y_test, y_pred2) ** 0.5)

Output: 2.226173694007794



XGBOOST Regressor:

model3 = XGBRFRegressor()
model3.fit(X_train,y_train)
y_pred3 = model3.predict(X_test)
print(mean_squared_error(y_test, y_pred3) ** 0.5)​

Output: 2.2031045005229757

Feature Importance:


plt.figure(figsize=(9,7))
feature_imp1 = model1.feature_importances_
sns.barplot(x=feature_imp1, y=X.columns)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important Features For Random Forest Regressor",{'fontsize':25})
plt.show();


Thank You ):





0 comments