Hashwanth Gogineni's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Agriculture » Mushroom Classification

Mushroom Classification

Models Status

Model Overview

Mushrooms:


Mushrooms have been consumed since early history. Ancient Greeks believed that mushrooms provided strength for Soldiers in battle, and the Romans perceived them as the "Food of Gods." For centuries, Chinese culture has treasured mushrooms as healthy food, 'elixir of life. They have been part of the human culture and have considerable interest in the most important civilizations in the history of humanity because of their sensory characteristics; they have been recognized for their attractive culinary attributes. Nowadays, mushrooms are popular valuable foods because they are low in calories, carbohydrates, fat, and sodium: also, they are cholesterol-free. Besides, mushrooms provide important nutrients, including selenium, potassium, riboflavin, niacin, vitamin D, proteins, and fiber. With a long history as a food source, mushrooms are important for their healing capacities and properties in traditional medicine. In addition, it has reported beneficial effects for the health and treatment of some diseases.



Poisonous Mushrooms:


Mushroom poisonings can occur because of forager misidentification of a poisonous species as edible, although many cases are intentional ingestions. Mushroom poisonings may range from benign symptoms of generalized gastrointestinal upset to potentially devastating manifestations, including liver failure, kidney failure, and neurologic sequelae. Up to 14 described syndromes manifest depending on the species, toxins, and amount ingested.




Project Implementation:


Agriculture sectors can use the project to identify 'Toxic mushrooms' and 'Edible mushrooms' to prevent humans from consuming them and falling sick. Likewise, organizations can use the project to weed out 'Poisonous Mushrooms' during production or harvesting.


 Dataset:


The dataset includes hypothetical samples corresponding to 23 gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is 'edible,' definitely 'poisonous,' or unknown edibility and not recommended. This latter class was combined with the 'poisonous' one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom.


Number of Attributes: 22 (all nominally valued)


Attribute Information: (classes: edible=e, poisonous=p) 






Random Forest:


'Random forest' is a supervised machine learning algorithm used widely in 'Classification' and 'Regression problems.' It builds 'decision trees' on different samples and takes their majority vote for 'classification' and average in case of 'regression.'


One of the most important features of the 'Random Forest' Algorithm is that it can handle the data set containing 'continuous' variables as in the case of regression and categorical variables as in the case of classification. As a result, it performs better results for classification problems.



Features of Random Forest:


Diversity- Not all features are considered while making an individual tree; each is different.


Immune to the curse of dimensionality- Since each tree does not consider all the features, the feature space is reduced.


Parallelization- Each tree is created independently out of different data and features. So we can make full use of the 'CPU' to build a random forest model.


Train-Test split- In a random forest model, we don't have to segregate the 'train' and 'test' data as there will always be 30% of the data that the decision tree does not see.


Stability- 'Stability' arises because the result is based on majority voting or averaging.




Understanding Code:


First, let us import the required libraries for the project.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pickle

 


And now load the data into the system.


df= pd.read_csv('mushrooms.csv')
df





Let us first get to know about a few important parts of a mushroom.



Also, have a look at important visualizations of our data.


# Pie-chart

print(df["class"].value_counts())
labels = 'Edible', 'Poisonous',
sizes = [4208, 3916]
colors = ['steelblue','yellowgreen']
fig1, ax1 = plt.subplots(figsize =(7.5,7.5))
ax1.pie(sizes,colors = colors , labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Class")
plt.show()





# Bar graphs related to dataset

features = df.columns
f, axes = plt.subplots(22,1, figsize=(15,150), sharey = True)
k = 1
for i in range(0,22):
s = sns.countplot(x = features[k], data = df, ax=axes[i], palette = 'GnBu')
axes[i].set_xlabel(features[k], fontsize=20)
axes[i].set_ylabel("Count", fontsize=20)
axes[i].tick_params(labelsize=15)
k = k+1
for p in s.patches:
s.annotate(format(p.get_height(), '.1f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha = 'center', va = 'center',
xytext = (0, 9),
fontsize = 15,
textcoords = 'offset points')





Coming to the 'Data Preprocessing' part, let us search for missing values in the data.


# Checking for missing values

NaN_count = df.isna().sum()
NaN_count





As you can see, we have missing values in the 'stalk-root' feature. So let us replace the Null values with 'mode.'


df['stalk-root'].fillna(df['stalk-root'].mode()[0], inplace=True)

 


Now, let us encode the data into numeric data using the 'Label encoder' function.


from sklearn import preprocessing

categorical = ['class', 'cap-shape','cap-surface','cap-color','bruises','odor','gill-attachment','gill-spacing','gill-size','gill-color','stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring','stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color','ring-number','ring-type','spore-print-color','population','habitat']
for feature in categorical:
label_encoder = preprocessing.LabelEncoder()
df[feature] = label_encoder.fit_transform(df[feature])

 


We need to split the data for training and to test the model.


Y = df[['class']] 
X = df.drop(['class'], axis=1)

# Splitting for training and testing the data

trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.2, shuffle=True, random_state=13)
trainX.head()

 


I used the 'Random Forest' algorithm to get the best possible predictions using the data. As you can see, we achieved high accuracy from our model.


# Random Forest

random_forest_model = RandomForestClassifier(oob_score=True)
random_forest_model.fit(trainX, trainY)

acc_random_forest = round(random_forest_model.score(trainX, trainY) * 100, 2)
oob_score = random_forest_model.oob_score_

print('Model Accuracy is %', acc_random_forest)
print('OOB Score: %', oob_score * 100)




Finally, let us save the model.


pickle.dump(random_forest_model,open('model.pkl','wb'))


Thank you for your time.


 


0 comments