Lung Cancer Detection

Hashwanth Gogineni

Related Listings

Crop Prediction

0 comments, 1 review , 2 likes
ECG Arrhythmia Classi...

0 comments, 0 reviews , 0 likes

Lyme Disease Detection

0 comments, 1 review , 655 views, 1 like
Census Income Prediction

0 comments, 1 review , 480 views, 1 like

Major Concepts

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Lung Cancer Detection

Models Status

Model Overview

Lung cancer

Lung cancer is cancer that starts in the lungs and spreads throughout the body. Your lungs are two spongy organs in your chest that take in oxygen and expel carbon dioxide when you breathe in and out. Lung cancer is the most common cancer that kills people around the world. Lung cancer is most common in smokers, although it can also strike persons who have never smoked. The amount of time and number of cigarettes you smoke raises your risk of lung cancer. You can dramatically reduce your risk of developing lung cancer if you quit smoking, even if you've been smoking for a long time.

Symptoms

In the early stages of lung cancer, there are usually no signs or symptoms. Lung cancer signs and symptoms usually appear when the disease has progressed.

Signs and symptoms of lung cancer may include:

Cough that doesn't go away

Coughing up blood

Shortness of breath

Chest pain

Hoarseness

Losing weight without trying

Bone pain

Headache

Why Lung Cancer Detection?

The project can be helpful for healthcare organizations to detect cancers in patients' lungs.

Dataset

There are 3 classes in the dataset, each with 5,000 images, being:

Lung adenocarcinoma

Lung benign

Lung squamous cell carcinoma

Which makes up a total of 15,000 images in the dataset.

Convolutional Neural Networks (ConvNets)

Convolutional Neural Networks are similar to the conventional Neural Networks discussed in the preceding chapter in that they are made up of neurons with learnable weights and biases. Each neuron takes some inputs, does a dot product, and then executes a non-linearity if desired. From the raw image pixels on one end to class scores on the other, the entire network still defines a single differentiable score function. They still contain a loss function on the last (fully-connected) layer (e.g. SVM/Softmax), and all of the tips/tricks we discovered for learning ordinary Neural Networks still apply.

So, what's new? The assumption that the inputs are images is explicit in ConvNet topologies, allowing us to embed specific attributes into the architecture. As a result, the forward function is more efficient to construct, and the number of parameters in the network is greatly reduced.

Understanding Code

First, let us import the required libraries for our project.

import os

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from pathlib import Path

from tensorflow.keras.preprocessing import image_dataset_from_directory  

import tensorflow as tf

import cv2

from keras.layers import Input, Lambda, Dense, Flatten,GlobalAveragePooling2D, Dropout, Activation

from keras.models import Model

from keras.preprocessing.image import ImageDataGenerator

from keras.models import Sequential

from tensorflow.python.keras.preprocessing.image import ImageDataGenerator

from sklearn.metrics import classification_report, log_loss, accuracy_score

Now, let us load the data into our system and convert the data into a dataframe.

image_dir = Path('/content/sample_data/lung_colon_image_set/lung_image_sets')



# Get filepaths and labels

filepaths = list(image_dir.glob(r'**/*.jpeg'))

labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1], filepaths))



filepaths = pd.Series(filepaths, name='Filepaths').astype(str)

labels = pd.Series(labels, name='Labels')



# Concatenate filepaths and labels

image_df = pd.concat([filepaths, labels], axis=1)



# Shuffle the DataFrame and reset index

image_df = image_df.sample(frac=1).reset_index(drop = True)



# Show the result

image_df.head()

As you can see, we extracted data from the data's directory and concatenated 'filepaths' and 'labels' into a dataframe.

Let us also split the dataframe for testing and training purposes.

# Separating train and test data

from sklearn.model_selection import train_test_split



train_df, test_df = train_test_split(image_df, train_size=0.85, shuffle=True, random_state=1)

As you can see, I used the "train_test_split" function to split the dataframe.

train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.15)



test_datagen = ImageDataGenerator(rescale=1./255)

As you can see, I used the 'ImageDataGenerator' function for data augmentation purposes.

train_images = train_datagen.flow_from_dataframe(

    dataframe=train_df,

    x_col='Filepaths',

    y_col='Labels',

    target_size=(224,224),

    color_mode='rgb',

    class_mode='categorical',

    batch_size=64,

    shuffle=True,

    seed=42,

    subset='training'

)



val_images = train_datagen.flow_from_dataframe(

    dataframe=train_df,

    x_col='Filepaths',

    y_col='Labels',

    target_size=(224, 224),

    color_mode='rgb',

    class_mode='categorical',

    batch_size=64,

    shuffle=True,

    seed=42,

    subset='validation'

)



test_images = test_datagen.flow_from_dataframe(

    dataframe=test_df,

    x_col='Filepaths',

    y_col='Labels',

    target_size=(224,224),

    color_mode='rgb',

    class_mode='categorical',

    batch_size=32,

    shuffle=False

)

Also, I loaded train and test data using the 'flow_from_dataframe' function into the kernel.

Next, let us get into the modelling part of the project.

input_shape = (224, 224, 3)

model = tf.keras.models.Sequential([

    tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=input_shape ),

    tf.keras.layers.MaxPool2D(pool_size = (2,2)),

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),

    tf.keras.layers.MaxPool2D(pool_size = (2,2)),

    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),

    tf.keras.layers.MaxPool2D(pool_size = (2,2)),

    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),

    tf.keras.layers.MaxPool2D(pool_size = (2,2)),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(512, activation='relu'),

    tf.keras.layers.Dense(128, activation='relu'),

    tf.keras.layers.Dropout(0.2),

    tf.keras.layers.Dense(128, activation='relu'),

    tf.keras.layers.Dropout(0.2),

    tf.keras.layers.Dense(3, activation='softmax')

])



model.summary()

Here I used 4 'Conv2D' layers, 4 'MaxPool2D' layers and 1 'flatten' layer, 2 'Dropout' layers, 4 'Dense' layers to get the best out of our data.

Now, let us compile our model and fit the data.

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])



callback = tf.keras.callbacks.EarlyStopping(monitor='accuracy', patience=2)



history = model.fit_generator(train_images, validation_data=val_images, epochs=25, callbacks=callback)

As you can see, I used 'categorical_crossentropy' and 'accuracy' as metrics.

Now let us understand how our model performed.

 get_acc = history.history['accuracy']

 value_acc = history.history['val_accuracy']

 get_loss = history.history['loss']

 validation_loss = history.history['val_loss']



 epochs = range(len(get_acc))

 plt.plot(epochs, get_acc, 'r', label='Accuracy of Training data')

 plt.plot(epochs, value_acc, 'b', label='Accuracy of Validation data')

 plt.title('Training vs validation accuracy')

 plt.legend(loc=0)

 plt.figure()

 plt.show()

Also, let us have a look at our model's classification report.

# Classification Report



from sklearn.metrics import classification_report



test_labels=test_images.classes

predictions=model.predict_generator(test_images, verbose=1)

y_pred = np.argmax(predictions, axis=-1)

print(classification_report(test_labels, y_pred))

Here in the report '0' represents 'Lung adenocarcinoma' and '1' represents 'Lung benign' and '2' represents 'Lung squamous cell carcinoma'.

Thank you for your time.

0 comments

Related Listings

Hashwanth Gogineni's other Models Reports

Major Concepts