Note: If the author has requested for "Expert Guidance" and you can help, please start a New Topic in the "Discussions" Tab

### Major Concepts Models Home » Domain Usecases » Agriculture » Predicting the age of Abalone

## Predicting the age of Abalone

### Model Overview

Context

Abalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.

Data Description

• From the original data examples with missing values were removed (the majority having the predicted value missing),
and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

• Number of instances:4177

• Number of attributes:8

• Features:Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight,
and Shell weight

• Target:Rings

Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.

• Attribute Information:

Given below is attribute name, type, measurement, and brief description.

Name                Data Type                Meas.  Description

-----                ---------   -----         -----------

Sex                    nominal                       M, F, and I (infant)

Length                continuous  mm           Longest shell measurement

Diameter            continuous  mm           perpendicular to length

Height                continuous  mm           with meat in shell

Whole weight        continuous  grams      whole abalone

Shucked weight        continuous  grams   weight of meat

Viscera weight        continuous  grams     gut weight (after bleeding)

Shell weight        continuous  grams       after being dried

Rings                integer                       +1.5 gives the age in years

`Dataset Link: https://www.kaggle.com/hurshd0/abalone-uciCode: Libraries:`

``````import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.linear_model import LinearRegression
from xgboost import XGBRFRegressor
import matplotlib.pyplot as plt
import seaborn as sns``````

``````df = pd.read_csv('abalone_original.csv') Check for NULL Values:

``df.isnull().sum()``

Missing Values: None

Create feature age which is our target variable and formula for the same is as follow:

Age = rings + 1.5 gives the age in years

``df['age'] = df.rings.apply(lambda x: x+1.5)​``

EDA:
Sex Feature:
M-Mal, F-Female, I-Infant

``````sns.countplot(x='sex',data=df);
plt.title("Distribution of Sex Featue",{'fontsize':25});`````` Other Feature:

``````fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes,x=df.height,y=df.rings,color='b');
axes.set_title("Height Vs Rings")
sns.scatterplot(ax=axes,x=df.length,y=df.rings,color='b');
axes.set_title("Length Vs Rings");`````` ``````fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes,x=df['whole-weight'],y=df.rings,color='b');
axes.set_title("Ehole Weight Vs Rings")
sns.scatterplot(ax=axes,x=df['shucked-weight'],y=df.rings,color='b');
axes.set_title("Shucked weight Vs Rings");`````` ``````fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);
sns.scatterplot(ax=axes,x=df['viscera-weight'],y=df.rings,color='b');
axes.set_title("viscera Weight Vs Rings")
sns.scatterplot(ax=axes,x=df['shell-weight'],y=df.rings,color='b');
axes.set_title("Shell-weight Vs Rings");`````` Split Data:

``````X = df.drop(['age','rings'],axis=1)
y = df.age``````

Convert Categorical Variable int Numeric:

``````from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
X.sex = encoder.fit_transform(X.sex)``````

``````from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)``````

Model Building:
Rando Forest Regressor:

``````model1 = RandomForestRegressor()
model1.fit(X_train,y_train)
y_pred1 = model1.predict(X_test)
print(mean_squared_error(y_test, y_pred1) ** 0.5)``````

Output: 2.1891808129021566

Linear Regression:

``````model2 = LinearRegression()
model2.fit(X_train,y_train)
y_pred2 = model2.predict(X_test)
print(mean_squared_error(y_test, y_pred2) ** 0.5)``````

Output: 2.226173694007794

XGBOOST Regressor:

``````model3 = XGBRFRegressor()
model3.fit(X_train,y_train)
y_pred3 = model3.predict(X_test)
print(mean_squared_error(y_test, y_pred3) ** 0.5)​``````

Output: 2.2031045005229757

Feature Importance:

``````plt.figure(figsize=(9,7))
feature_imp1 = model1.feature_importances_
sns.barplot(x=feature_imp1, y=X.columns) 