NEHA SINGH's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Diabetic Retinopathy

Models Status

Model Overview

Problem Statement
Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people. With the help of the model, we can detect diabetic retinopathy in an earlier stage in an automated fashion. It will act as a second eye to doctors and reduce the amount of time required by them.

Data Description
Source: https://www.kaggle.com/c/diabetic-retinopathy-detection/data, it has 82 GB of data, with images as with and without Diabetic Retinopathy
Class: Diabetic Retinopathy(2816) and Non-Diabetic Retinopathy(3267)

Training Dataset
Found 6083 images belonging to 2 classes.

Testing Dataset
Found 1431 images belonging to 2 classes.

Model
VGG16

Accuracy
Train Accuracy: 90%
Validation Accuracy: 62%


Talking about the dataset used here:https://www.kaggle.com/c/diabetic-retinopathy-detection/data,it was 82.23 GB. It was quite huge and Google Colab(which was my development tool), does not support more than 25 GB. So, I used some amount of dataset.

In the dataset, we had four categories, as
0 - No DR
1 - Mild
2 - Moderate
3 - Severe
4 - Proliferative DR

But due to the large difference in the number of images for each class, to remove class imbalance, I did binary classification, where I took almost 2000 images for training and 200 images for validation.

So, let's get into our dataset and start exploring it.


The images were present 001, 002, .. file format, so I have used "7zip" for extracting these images.
We have also got one labels.csv file for labeling our images, and the training them.

We will categories the images using the name and then transfer them to zero and one folder using Jupyter Lab.
The shape of the dataset.

Now, I am creating on target directory where I will store all the image 


import os
import shutil

TargetDir = "Dataset/one"

if not os.path.exists(TargetDir):
os.mkdir(TargetDir)

Here we are importing os and shutil for copying the images from one folder to another. We have the images in the train folder, and we have to store the images with labels 1,2,3,4 in that folder. The number of images we have for one is 4415, so these many images will be transferred to that one folder. 

We will iterate through all rows and if the level has values 1,2,3,4 then we will transfer that image. We can also see that the images do not have a jpeg extension in their name so we have added that as well which gives a path to the images folder. Else it will give an error. 


import os
import shutil

IMAGES_PATH = "train/train/"
TargetDir = "Dataset/one"

cnt = 0

for (i, row) in df.iterrows():
if row["level"] == 4 or row["level"] == 3 or row["level"] == 2 or row["level"] == 1:
filename = row["image"]
image_path = os.path.join(IMAGES_PATH, filename+".jpeg")
image_copy_path = os.path.join(TargetDir, filename)
if os.path.exists(image_path):
shutil.copy2(image_path, image_copy_path)
print("Moving ", cnt)
cnt+=1
print(cnt)

Now, we will follow the same procedure for the zero folders.


import os
import shutil

IMAGES_PATH = "train/train/"
TargetDir = "Dataset/zero"

cnt = 0

for (i, row) in df.iterrows():
if row["level"] == 0:
filename = row["image"]
image_path = os.path.join(IMAGES_PATH, filename+".jpeg")
image_copy_path = os.path.join(TargetDir, filename)
if os.path.exists(image_path):
shutil.copy2(image_path, image_copy_path)
print("Moving ", cnt)
cnt+=1
print(cnt)

Now we have 4415 images in one folder, and 3386 images in zero folders. We can manually shift the 1k images for each zero and one folder for training, and then 200 images in the validation folder. I have uploaded these image to google drive and then processed them as every time, if I upload the images then they will be lost after 12 hrs, but I believe there is some way to save the images in colab, that can be done at your side. 

Let's Code now.


Connecting Google Drive to Colab.


Define training and validation folder.


Importing all the libraries we are going to use.


import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.layers import *
from keras.models import *
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array

We will be using VGG 16 model


Instantiates the VGG16 model, with input shape as 224,224,3, and pre-training on imagenet weights.


VGG = keras.applications.VGG16(input_shape=(224,224,3), include_top=False, weights='imagenet')
VGG.trainable = False

As you can see in the image we have three dense layers, we will firstly flatten the layers, then provide size and dimension for those three layers.
For compiling we are using optimizer s adam, loss parameter as categorical_crossentrophy, and accuracy metrics.


model_vgg = keras.Sequential(
[
VGG,
keras.layers.Flatten(),
keras.layers.Dense(units=256,activation="relu"),
keras.layers.Dense(units=256,activation="relu"),
keras.layers.Dense(units=2,activation="softmax")
]
)

model_vgg.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

Checking the model summary:


model_vgg.summary()


You can see we have here 21.203.778 parameters found.


Creating Test and Train Data Generator:


trData_vgg = ImageDataGenerator()
trainData_vgg = trData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Train/', target_size=(224,224))

tsData_vgg = ImageDataGenerator()
testData_vgg = tsData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Validate/', target_size=(224,224))


import tensorflow as tf
import keras

tf.config.run_functions_eagerly(True)

Here I have used the train data generator using the image.ImageDataGenerator has len function which we need in the model for providing steps per epoch.


# Train from scratch
train_datagen = image.ImageDataGenerator(
rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
)

test_dataset = image.ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Train/',
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')

Model creation:
For VGG16, we will train the model by feeding it with training data, steps per epoch, epochs, validation data, and validation steps.

I have used the 5 epochs for the first time as this process takes a long time and then saved the model so that we can continue training from that model onwards.


import os

model = model_vgg

model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 5,
validation_data = testData_vgg,
validation_steps = 10)
model.save('/content/gdrive/MyDrive/Neha Singh/Projects/ML/Covid_Dataset/Model/model_vgg.h5')


Again fitting the model as the accuracy was 90%.


model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)​


We can see that the validation accuracy is 56%, which can mean that the model is over-fitted as the accuracy on the training set is 93%.


model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)​


On, again fitting the model we achieved an accuracy of 100% on training but only 60% on testing.


model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)


I was trying to increase the accuracy of the validation dataset but it will take a long time I see.

Now, I will try increasing the size of the dataset and then build the model again to check if the accuracy increases.


trData_vgg = ImageDataGenerator()
trainData_vgg = trData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha/Train', target_size=(224,224))

tsData_vgg = ImageDataGenerator()
testData_vgg = tsData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha/Validate', target_size=(224,224))


Now, we can see we have 6k+ images for training and 1.5k around images for validation.


import os

model = model_vgg

model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 5,
validation_data = testData_vgg,
validation_steps = 10)
model.save('/content/gdrive/MyDrive/Neha/Model/model_vgg.h5')


This time we get 78% accuracy in the training dataset, and 66% in the validation dataset, which means with more images model is working better.
let us build the model again and check if accuracy increases further.


model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)


now, we got accuracy in the training dataset as 85, and the validation dataset as 63 which is better than the earlier model with fewer images.

Finally, let's run one epoch and see what better we can see.

This is much better and stable.


0 comments