Note: If the author has requested for "Expert Guidance" and you can help, please start a New Topic in the "Discussions" Tab

Hashwanth Gogineni's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Lung Opacity Detection

Models Status

Model Overview

Ground Glass Opacity

The hazy grey regions that can be seen in CT scans or X-rays of the lungs are known as ground-glass opacity (GGO). The increasing density inside the lungs is seen by these grey spots. The word stems from a glassmaking technique in which sand is used to blast the surface of the glass. The glass appears hazy white or frosted as a result of this process.GGO can be caused by a variety of factors, including infections, inflammation, and growth. GGO was also the most common abnormality among persons with COVID-19-related pneumonia, according to a 2020 analysis.

There are several types of GGO. These include:

  • Diffuse: Diffuse opacities appear in several lobes of one or both lungs. When the air in the lungs is replaced by fluid, inflammation, or damaged tissue, this pattern develops.

  • Nodular: Both benign and malignant illnesses might be indicated by this kind. GGO that appears on many scans could suggest premalignant or malignant growths.

  • Centrilobular: This form of cancer occurs in one or more lung lobules. The hexagonal divisions of the lung are called lobules. The connective tissue that connects the lobules is not harmed.

  • Mosaic: When small arteries or airways within the lungs become obstructed, this pattern emerges. The intensity of the opaque patches varies.

  • Crazy paving: Crazy pavement seems like a straight line. When the crevices between the lobules widen, this can happen.

  • Halo sign: The area around the nodules is filled with this type of opacity.

  • Reversed halo sign: An region that is almost completely encircled by liquid-filled tissue is known as a reversed halo sign.

Why Lung Opacity Detection?

The project can help healthcare organizations detect Lung Opacity problems in patients and take necessary measures. 


The dataset includes '12,024' Chest X-ray images of Healthy patients and Lung Opacity patients i.e '6,012 ' images in each class.


In layman's terms, VGG is a deep CNN that is used to classify images. The VGG19 model has the following layers:

  • This network was given a fixed-size (224 * 224) RGB image as input, implying that the matrix was of shape (224,224,3).

  • The only preprocessing was to subtract the mean RGB value from each pixel, which was computed throughout the entire training set.

  • They used kernels with a size of (3 * 3) and a stride size of 1 pixel to cover the entire image concept.

  • To keep the image's spatial resolution, spatial padding was applied.

  • Max pooling was done using stride 2 over 2 * 2 pixel windows.

  • This was followed by a Rectified linear unit (ReLu) to introduce non-linearity to increase model classification and computational time, since earlier models used tanh or sigmoid functions, and this proved to be far superior to them.

  • Three fully connected layers were implemented, the first two of which were of size 4096, followed by a layer with 1000 channels for 1000-way ILSVRC classification, and finally a softmax function.

Understanding Code

First, let us import the required libraries for our project.

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from tensorflow.keras.preprocessing import image_dataset_from_directory
import tensorflow as tf
import cv2
from keras.layers import Input, Lambda, Dense, Flatten,GlobalAveragePooling2D, Dropout, Activation
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, log_loss, accuracy_score

Now, let us load the data into our system and convert the data into a dataframe.

image_dir = Path('/content/sample_data/data')

# Get filepaths and labels
filepaths = list(image_dir.glob(r'**/*.png'))
labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1], filepaths))

filepaths = pd.Series(filepaths, name='Filepaths').astype(str)
labels = pd.Series(labels, name='Labels')

# Concatenate filepaths and labels
image_df = pd.concat([filepaths, labels], axis=1)

# Shuffle the DataFrame and reset index
image_df = image_df.sample(frac=1).reset_index(drop = True)

As you can see, we extracted data from the data's directory and concatenated 'filepaths' and 'labels' into a dataframe.

Let us also split the dataframe for testing and training purposes.

# Separating train and test data
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(image_df, train_size=0.85, shuffle=True, random_state=1)

As you can see, I used the "train_test_split" function to split the dataframe.

train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.15)

test_datagen = ImageDataGenerator(rescale=1./255)

As you can see, I used the 'ImageDataGenerator' function for data augmentation purposes.

train_images = train_datagen.flow_from_dataframe(
target_size=(250, 250),

val_images = train_datagen.flow_from_dataframe(
target_size=(250, 250),

test_images = test_datagen.flow_from_dataframe(
target_size=(250, 250),

Also, I loaded train and test data using the 'flow_from_dataframe' function into the kernel.

Next, let us get into the modelling part of the project.

from tensorflow.keras.applications import VGG19
vgg = VGG19(include_top=False, weights='imagenet', input_shape=(224,224,3))

So, I chose the 'VGG19' model to get the best results.
I used 'imagenet' as weights for our model.

for layer in vgg.layers:

x = Flatten()(vgg.output)
x = Dense(512,activation='relu')(x)
x = Dense(512,activation='relu')(x)
prediction = Dense(2,activation='softmax')(x)
model = Model(inputs=vgg.input, outputs=prediction)

I also used neural network layers to the model for efficient results.

Now, let us compile our model and fit the data.


callback = tf.keras.callbacks.EarlyStopping(monitor='accuracy', patience=2)

history = model.fit_generator(train_images, validation_data=val_images, epochs=25, callbacks=callback)

As you can see, I used 'categorical_crossentropy' and 'accuracy' as metrics.

Now let us understand how our model performed.

get_acc = history.history['accuracy']
value_acc = history.history['val_accuracy']
get_loss = history.history['loss']
validation_loss = history.history['val_loss']

epochs = range(len(get_acc))
plt.plot(epochs, get_acc, 'r', label='Accuracy of Training data')
plt.plot(epochs, value_acc, 'b', label='Accuracy of Validation data')
plt.title('Training vs validation accuracy')

Finally, I used 'matplotlib.pyplot' to generate our model's performance graph.

Also, let us have a look at our model's classification report.

# Classification Report

from sklearn.metrics import classification_report

predictions=model.predict_generator(test_images, verbose=1)
y_pred = np.argmax(predictions, axis=-1)
print(classification_report(test_labels, y_pred))

Here in the report '0' represents 'Lung Opacity' and '1' represents 'Healthy'

Thank you for your time