Now, I am creating on target directory where I will store all the image
import os
import shutil
TargetDir = "Dataset/one"
if not os.path.exists(TargetDir):
os.mkdir(TargetDir)
Here we are importing os and shutil for copying the images from one folder to another. We have the images in the train folder, and we have to store the images with labels 1,2,3,4 in that folder. The number of images we have for one is 4415, so these many images will be transferred to that one folder.
We will iterate through all rows and if the level has values 1,2,3,4 then we will transfer that image. We can also see that the images do not have a jpeg extension in their name so we have added that as well which gives a path to the images folder. Else it will give an error.
import os
import shutil
IMAGES_PATH = "train/train/"
TargetDir = "Dataset/one"
cnt = 0
for (i, row) in df.iterrows():
if row["level"] == 4 or row["level"] == 3 or row["level"] == 2 or row["level"] == 1:
filename = row["image"]
image_path = os.path.join(IMAGES_PATH, filename+".jpeg")
image_copy_path = os.path.join(TargetDir, filename)
if os.path.exists(image_path):
shutil.copy2(image_path, image_copy_path)
print("Moving ", cnt)
cnt+=1
print(cnt)
Now, we will follow the same procedure for the zero folders.
import os
import shutil
IMAGES_PATH = "train/train/"
TargetDir = "Dataset/zero"
cnt = 0
for (i, row) in df.iterrows():
if row["level"] == 0:
filename = row["image"]
image_path = os.path.join(IMAGES_PATH, filename+".jpeg")
image_copy_path = os.path.join(TargetDir, filename)
if os.path.exists(image_path):
shutil.copy2(image_path, image_copy_path)
print("Moving ", cnt)
cnt+=1
print(cnt)
Now we have 4415 images in one folder, and 3386 images in zero folders. We can manually shift the 1k images for each zero and one folder for training, and then 200 images in the validation folder. I have uploaded these image to google drive and then processed them as every time, if I upload the images then they will be lost after 12 hrs, but I believe there is some way to save the images in colab, that can be done at your side.
Let's Code now.
Connecting Google Drive to Colab.
Define training and validation folder.
Importing all the libraries we are going to use.
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.layers import *
from keras.models import *
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
We will be using VGG 16 model
Instantiates the VGG16 model, with input shape as 224,224,3, and pre-training on imagenet weights.
VGG = keras.applications.VGG16(input_shape=(224,224,3), include_top=False, weights='imagenet')
VGG.trainable = False
As you can see in the image we have three dense layers, we will firstly flatten the layers, then provide size and dimension for those three layers.
For compiling we are using optimizer s adam, loss parameter as categorical_crossentrophy, and accuracy metrics.
model_vgg = keras.Sequential(
[
VGG,
keras.layers.Flatten(),
keras.layers.Dense(units=256,activation="relu"),
keras.layers.Dense(units=256,activation="relu"),
keras.layers.Dense(units=2,activation="softmax")
]
)
model_vgg.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
Checking the model summary:
model_vgg.summary()
You can see we have here 21.203.778 parameters found.
Creating Test and Train Data Generator:
trData_vgg = ImageDataGenerator()
trainData_vgg = trData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Train/', target_size=(224,224))
tsData_vgg = ImageDataGenerator()
testData_vgg = tsData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Validate/', target_size=(224,224))
import tensorflow as tf
import keras
tf.config.run_functions_eagerly(True)
Here I have used the train data generator using the image.ImageDataGenerator has len function which we need in the model for providing steps per epoch.
# Train from scratch
train_datagen = image.ImageDataGenerator(
rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
)
test_dataset = image.ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'/content/gdrive/MyDrive/Neha Singh/Projects/ML/smallData_2k/Train/',
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
Model creation:
For VGG16, we will train the model by feeding it with training data, steps per epoch, epochs, validation data, and validation steps.
I have used the 5 epochs for the first time as this process takes a long time and then saved the model so that we can continue training from that model onwards.
import os
model = model_vgg
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 5,
validation_data = testData_vgg,
validation_steps = 10)
model.save('/content/gdrive/MyDrive/Neha Singh/Projects/ML/Covid_Dataset/Model/model_vgg.h5')
Again fitting the model as the accuracy was 90%.
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)
We can see that the validation accuracy is 56%, which can mean that the model is over-fitted as the accuracy on the training set is 93%.
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)
On, again fitting the model we achieved an accuracy of 100% on training but only 60% on testing.
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)
I was trying to increase the accuracy of the validation dataset but it will take a long time I see.
Now, I will try increasing the size of the dataset and then build the model again to check if the accuracy increases.
trData_vgg = ImageDataGenerator()
trainData_vgg = trData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha/Train', target_size=(224,224))
tsData_vgg = ImageDataGenerator()
testData_vgg = tsData_vgg.flow_from_directory('/content/gdrive/MyDrive/Neha/Validate', target_size=(224,224))
Now, we can see we have 6k+ images for training and 1.5k around images for validation.
import os
model = model_vgg
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 5,
validation_data = testData_vgg,
validation_steps = 10)
model.save('/content/gdrive/MyDrive/Neha/Model/model_vgg.h5')
This time we get 78% accuracy in the training dataset, and 66% in the validation dataset, which means with more images model is working better.
let us build the model again and check if accuracy increases further.
model.fit(trainData_vgg,
steps_per_epoch=len(train_generator),
epochs = 3,
validation_data = testData_vgg,
validation_steps = 10)
now, we got accuracy in the training dataset as 85, and the validation dataset as 63 which is better than the earlier model with fewer images.
Finally, let's run one epoch and see what better we can see.
This is much better and stable.