Hashwanth Gogineni's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Agriculture » Cocoa Health Condition Prediction

Cocoa Health Condition Prediction

Models Status

Model Overview

Cacao


Cacao (Theobroma cacao), popularly known as Cocoa, is a tropical evergreen tree in the Malvaceae family farmed for its delicious seeds. Its scientific name translates to "food of the gods" in Greek. 


Cacao is produced commercially throughout the New World tropics and 'western Africa' and 'tropical Asia', where it is native to lowland rainforests of the 'Amazon' and 'Orinoco river' basins. 


'Cocoa powder,' 'Cocoa butter,' and 'chocolate' are made from its seeds, known as cocoa beans.


Cacao grows to a height of 6–12 meters (20–40 feet) in the forest understory, generally at the lower end of this range. 


Its oblong leathery leaves can grow 30 cm (12 inches) long and are lost and replaced regularly by new leaves that are bright red when young. 


Its blooms are either foul-smelling or odourless, and they may be found at any time of year, although they bloom in large numbers twice a year. 


These blooms are roughly 1 cm (0.4 inches) tall and wide and grow in bunches directly from the trunk and limbs. 


Depending on the kind, they might be white, rose, pink, yellow, or brilliant red and are pollinated by small insects known as midges in various locations.


Black Pod rot


Infection shows as a chocolate brown patch on the pod's surface that quickly spreads and covers the entire surface.


As the illness progresses, a white fungus growth with fungal sporangia appears on the damaged pod surface. 


The injured pods eventually turn brown to black.


As a result of infection, the interior tissues and the beans get discoloured.


Cocoa pod borer


The cocoa pod borer causes external damage to the pod in the form of entry and exit holes in the husk made by tunnelling larvae and general premature or uneven ripening (yellowing) of pods caused by internal feeding activities. 


Beans typically cling together when pods are cut open due to distinctive tunnels and scarification created by eating. 


In severe infestations, harvested beans cluster together and may be hard to separate from damaged pods.


Why Cocoa Health Condition Prediction?


The project can help agriculture organizations and farmers to detect infected Cocoa and eliminate them.



Dataset


The dataset consists of '2,092' Cocoa images of size '1080x1080'. The dataset includes classes 'Healthy' and 'Unhealthy,' having 1,046 images in each of them.



Convolutional Neural Network (CNN)


A 'Convolutional Neural Network' is a deep learning system that can take an input image, assign relevance (weights and biases) to various aspects/objects in the image, and distinguish between them. 


Compared to other classification methods, the amount of pre-processing required by a 'ConvNet' is significantly less. 


While basic approaches require hand-engineering of filters, ConvNets can learn these filters/characteristics with enough training.


The architecture of a ConvNet is inspired by the organization of the Visual Cortex and is akin to the connectivity pattern of Neurons in the Human Brain. Individual neurons can only respond to stimuli in a small area of the visual field called the Receptive Field. Several similar fields can be stacked on top of each other to span the full visual field.



Understanding Code


First, let us import the required libraries for our project.


import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from tensorflow.keras.preprocessing import image_dataset_from_directory
import tensorflow as tf
import cv2
from keras.layers import Input, Lambda, Dense, Flatten,GlobalAveragePooling2D, Dropout, Activation
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, log_loss, accuracy_score


Now, let us load the data into our system and convert the data into a dataframe.


image_dir = Path('/content/sample_data/data')

# Get filepaths and labels
filepaths = list(image_dir.glob(r'**/*.jpg'))
labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1], filepaths))

filepaths = pd.Series(filepaths, name='Filepaths').astype(str)
labels = pd.Series(labels, name='Labels')

# Concatenate filepaths and labels
image_df = pd.concat([filepaths, labels], axis=1)

# Shuffle the DataFrame and reset index
image_df = image_df.sample(frac=1).reset_index(drop = True)

# Show the result
image_df.head()

As you can see, we extracted data from the data's directory and concatenated 'filepaths' and 'labels' into a dataframe.

Let us also split the dataframe for testing and training purposes.


# Separating train and test data
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(image_df, train_size=0.85, shuffle=True, random_state=1)

As you can see, I used the "train_test_split" function to split the dataframe.


train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.15)

test_datagen = ImageDataGenerator(rescale=1./255)

As you can see, I used the 'ImageDataGenerator' function for data augmentation purposes.


train_images = train_datagen.flow_from_dataframe(
dataframe=train_df,
x_col='Filepaths',
y_col='Labels',
target_size=(250, 250),
color_mode='rgb',
class_mode='categorical',
batch_size=64,
shuffle=True,
seed=42,
subset='training'
)

val_images = train_datagen.flow_from_dataframe(
dataframe=train_df,
x_col='Filepaths',
y_col='Labels',
target_size=(250, 250),
color_mode='rgb',
class_mode='categorical',
batch_size=64,
shuffle=True,
seed=42,
subset='validation'
)

test_images = test_datagen.flow_from_dataframe(
dataframe=test_df,
x_col='Filepaths',
y_col='Labels',
target_size=(250, 250),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=False
)

Also, I loaded train and test data using the 'flow_from_dataframe' function into the kernel.

Next, let us get into the modelling part of the project.


from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPool2D

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape = (250, 250, 3), padding="same", activation = 'relu', data_format = 'channels_last'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPool2D(pool_size=(3, 3)))

model.add(Conv2D(64, (3, 3), activation='relu', padding="same"))
model.add(Conv2D(64, (3, 3), activation='relu', padding="same"))
model.add(MaxPool2D(pool_size=(3, 3)))

model.add(Conv2D(128, (3, 3), activation='relu', padding="same"))
model.add(Conv2D(128, (3, 3), activation='relu', padding="same"))
model.add(MaxPool2D(pool_size=(3, 3)))

model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(256, activation='relu'))
model.add(Dense(2, activation='softmax'))

model.summary()

Here I used 6 'Conv2D' layers, 3 'MaxPool2D' layers and 1 'flatten' layer, 1 'Dropout' layer, 2 'Dense' layers to get the best out of our data.


Now, let us compile our model and fit the data.


model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

callback = tf.keras.callbacks.EarlyStopping(monitor='accuracy', patience=2)

history = model.fit_generator(train_images, validation_data=val_images, epochs=50, callbacks=callback)

As you can see, I used 'categorical_crossentropy' and 'accuracy' as accuracy metrics.


 get_acc = history.history['accuracy']
value_acc = history.history['val_accuracy']
get_loss = history.history['loss']
validation_loss = history.history['val_loss']

epochs = range(len(get_acc))
plt.plot(epochs, get_acc, 'r', label='Accuracy of Training data')
plt.plot(epochs, value_acc, 'b', label='Accuracy of Validation data')
plt.title('Training vs validation accuracy')
plt.legend(loc=0)
plt.figure()
plt.show()


Finally, I used 'matplotlib.pyplot' to generate our model's performance's graph.

Also, let us have a look at our model's classification report.


# Classification Report

from sklearn.metrics import classification_report

test_labels=test_images.classes
predictions=model.predict_generator(test_images, verbose=1)
y_pred = np.argmax(predictions, axis=-1)
print(classification_report(test_labels, y_pred))


Here in the report '0' represents 'Healthy' and '1' represents 'Diseased'



Thank you for your time


0 comments