Predicting the age of Abalone

Prasad Chaskar

Related Listings

Alzheimer's brain Dis...

0 comments, 2 reviews , 3 likes
Heart Failure Prediction

0 comments, 0 reviews , 0 likes

Brain Stroke Prediction

0 comments, 4 reviews , 992 views, 5 likes
Thyroid Disease Prediction

0 comments, 2 reviews , 522 views, 3 likes

Major Concepts

Models Home » Domain Usecases » Agriculture » Predicting the age of Abalone

Predicting the age of Abalone

Models Status

Model Overview

About Dataset

Context

Abalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.

Data Description

From the original data examples with missing values were removed (the majority having the predicted value missing),
and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

Number of instances:4177

Number of attributes:8

Features:Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight,
and Shell weight

Target:Rings

Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.

Attribute Information:

Given below is attribute name, type, measurement, and brief description.

Name Data Type Meas. Description

----- --------- ----- -----------

Sex nominal M, F, and I (infant)

Length continuous mm Longest shell measurement

Diameter continuous mm perpendicular to length

Height continuous mm with meat in shell

Whole weight continuous grams whole abalone

Shucked weight continuous grams weight of meat

Viscera weight continuous grams gut weight (after bleeding)

Shell weight continuous grams after being dried

Rings integer +1.5 gives the age in years

Dataset Link: https://www.kaggle.com/hurshd0/abalone-uci

Code: 
Libraries:

import pandas as pd

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error

import numpy as np

from sklearn.linear_model import LinearRegression

from xgboost import XGBRFRegressor

import matplotlib.pyplot as plt

import seaborn as sns

Load Dataset:

df = pd.read_csv('abalone_original.csv')

df.head()

Check for NULL Values:

df.isnull().sum()

Missing Values: None

Create feature age which is our target variable and formula for the same is as follow:

Age = rings + 1.5 gives the age in years

df['age'] = df.rings.apply(lambda x: x+1.5)

EDA:
Sex Feature:
M-Mal, F-Female, I-Infant

sns.countplot(x='sex',data=df);

plt.title("Distribution of Sex Featue",{'fontsize':25});

Other Feature:

fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);

sns.scatterplot(ax=axes[0],x=df.height,y=df.rings,color='b');

axes[0].set_title("Height Vs Rings")

sns.scatterplot(ax=axes[1],x=df.length,y=df.rings,color='b');

axes[1].set_title("Length Vs Rings");

fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);

sns.scatterplot(ax=axes[0],x=df['whole-weight'],y=df.rings,color='b');

axes[0].set_title("Ehole Weight Vs Rings")

sns.scatterplot(ax=axes[1],x=df['shucked-weight'],y=df.rings,color='b');

axes[1].set_title("Shucked weight Vs Rings");

fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True);

sns.scatterplot(ax=axes[0],x=df['viscera-weight'],y=df.rings,color='b');

axes[0].set_title("viscera Weight Vs Rings")

sns.scatterplot(ax=axes[1],x=df['shell-weight'],y=df.rings,color='b');

axes[1].set_title("Shell-weight Vs Rings");

Split Data:

X = df.drop(['age','rings'],axis=1)

y = df.age

Convert Categorical Variable int Numeric:

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()

X.sex = encoder.fit_transform(X.sex)

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y)

Model Building:
Rando Forest Regressor:

model1 = RandomForestRegressor()

model1.fit(X_train,y_train)

y_pred1 = model1.predict(X_test)

print(mean_squared_error(y_test, y_pred1) ** 0.5)

Output: 2.1891808129021566

Linear Regression:

model2 = LinearRegression()

model2.fit(X_train,y_train)

y_pred2 = model2.predict(X_test)

print(mean_squared_error(y_test, y_pred2) ** 0.5)

Output: 2.226173694007794

XGBOOST Regressor:

model3 = XGBRFRegressor()

model3.fit(X_train,y_train)

y_pred3 = model3.predict(X_test)

print(mean_squared_error(y_test, y_pred3) ** 0.5)

Output: 2.2031045005229757

Feature Importance:

plt.figure(figsize=(9,7))

feature_imp1 = model1.feature_importances_

sns.barplot(x=feature_imp1, y=X.columns)

# Add labels to your graph

plt.xlabel('Feature Importance Score')

plt.ylabel('Features')

plt.title("Visualizing Important Features For Random Forest Regressor",{'fontsize':25})

plt.show();

Thank You ):

0 comments

Prasad Chaskar and Advika Banerjee like this

Related Listings

Prasad Chaskar's other Models Reports

Major Concepts

Predicting the age of Abalone

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Prasad Chaskar's other Models Reports

Major Concepts

Predicting the age of Abalone

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us