Pneumonia Detection

Raji Reddy A

Related Listings

Parkinson's Disease P...

0 comments, 3 reviews , 4 likes
Bitcoin Price Predict...

0 comments, 1 review , 1 like

Real Time Hard Hat Detection

0 comments, 1 review , 927 views, 2 likes

Major Concepts

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Pneumonia Detection

Models Status

Model Overview

Objective:

Given a chest radiograph image, the model should predict a bounding box on the lungs where pneumonia exists. Normal lungs and pneumonia affected lungs can be differentiated by the opacity in the images.

The image of normal lungs appear dark as it is filled with air

If the air is replaced by another substance like fluid or fibrosis, it appears hazy

Dataset

Source: https://www.rsna.org/education/ai-resources-and-training/ai-image-challenge/RSNA-Pneumonia-Detection-Challenge-2018
Torrent: https://academictorrents.com/details/95588a735c9ae4d123f3ca408e56570409bcf2a9

Download the dataset using python3-libtorrent package

Install the required packages

python3 -m pip install --upgrade pip setuptools wheel

python3 -m pip install lbry-libtorrent

apt install python3-libtorrent

Download the torrent file from https://academictorrents.com/details/95588a735c9ae4d123f3ca408e56570409bcf2a9

import libtorrent as lt



ses = lt.session()

ses.listen_on(6881, 6891)

downloads = []



torrent_path = 'kaggle-pneumonia-jpg-95588a735c9ae4d123f3ca408e56570409bcf2a9.torrent' #torrent file path

params = {

"save_path": "/content/Torrent",

"ti": lt.torrent_info(torrent_path),

}

downloads.append(ses.add_torrent(params))





import time

from IPython.display import display

import ipywidgets as widgets



state_str = [

"queued",

"checking",

"downloading metadata",

"downloading",

"finished",

"seeding",

"allocating",

"checking fastresume",

]



layout = widgets.Layout(width="auto")

style = {"description_width": "initial"}

download_bars = [

widgets.FloatSlider(

step=0.01, disabled=True, layout=layout, style=style

)

for _ in downloads

]

display(*download_bars)



while downloads:

next_shift = 0

for index, download in enumerate(downloads[:]):

bar = download_bars[index + next_shift]

if not download.is_seed():

s = download.status()



bar.description = " ".join(

[

download.name(),

str(s.download_rate / 1000),

"kB/s",

state_str[s.state],

]

)

bar.value = s.progress * 100

else:

next_shift -= 1

ses.remove_torrent(download)

downloads.remove(download)

bar.close() # Seems to be not working in Colab (see https://github.com/googlecolab/colabtools/issues/726#issue-486731758)

download_bars.remove(bar)

print(download.name(), "complete")

time.sleep(1)

Process the data to train a yolov3 model using keras and tensorflow

The data contains images for both normal and pneumonia affected lungs. For this detection model, we will only consider the image of pneumonia affected lungs. In the stage_2_train_labels.csv, the value ‘1’ in Target column indicates that the image has lungs affected with pneumonia

import pandas as pd



df = pd.read_csv('/content/Torrent/kaggle-pneumonia-jpg/stage_2_train_labels.csv')

df = df[df['Target'] == 1]

The annotations in the csv are in the format of x, y, width, height. They can be converted to xmin, ymin, xmax, ymax format in the following way

df['xmin'] = df['x']

df['ymin'] = df['y']

df['xmax'] = df['x'] + df['width']

df['ymax'] = df['y'] + df['height']

df = df[['patientId', 'xmin', 'ymin', 'xmax', 'ymax']]

patientId_order = df['patientId']

s = df.groupby(['patientId']).cumcount()

df1 = df.set_index(['patientId', s]).unstack().sort_index(level=1, axis=1)

df1.columns = [f'{x}{y}' for x, y in df1.columns]

df1 = df1.reset_index()



df1 = df1.set_index('patientId')

df1.loc[list(patientId_order.unique())]



df1.to_csv('/content/pneumonia_detection/annotations.csv')

Generate XML annotation files

The images are annotated with 1 or 2 or 3 or 4 bounding boxes based on the pneumonia regions present. The below code ignores the images with 4 bounding boxes as there are only few such images

import numpy as np

import pandas as pd

import textwrap

from xml.sax.saxutils import escape



df = pd.read_csv('/content/pneumonia_detection/annotations.csv')



# create df for entries with only 1 bounding box

df_1box = df[df.xmin1.isnull()]

df_1box = df_1box.drop(['xmin1', 'ymin1',

'xmax1', 'ymax1', 'xmin2', 'ymin2', 'xmax2', 'ymax2', 'xmin3',

'ymin3', 'xmax3', 'ymax3'], axis=1)

df_1box.info()



# create df for entries with 2 bounding boxes

df_2box = df[df.xmin2.isnull() & df.xmin1.notnull()]

df_2box = df_2box.drop(['xmin2', 'ymin2', 'xmax2', 'ymax2', 'xmin3',

'ymin3', 'xmax3', 'ymax3'], axis=1)

df_2box.loc[:, 'xmin0':] = df_2box.loc[:, 'xmin0':].astype(int)

# print(df_2box.head(10))

print(df_2box.info())



# create df for entries with 3 bounding boxes

df_3box = df[df.xmin3.isnull() & df.xmin2.notnull()]

df_3box = df_3box.drop(['xmin3', 'ymin3', 'xmax3', 'ymax3'], axis=1)

df_3box.loc[:, 'xmin0':] = df_3box.loc[:, 'xmin0':].astype(int)

print(df_3box.head())

Create XML templates

template_1box = textwrap.dedent("""\

<annotation>

<folder>VOC2007</folder>

<filename>{filename}.jpg</filename>

<source>

<database>The VOC2007 Database</database>

<annotation>PASCAL VOC2007</annotation>

<image>flickr</image>

<flickrid>336426776</flickrid>

</source>

<owner>

<flickrid>Elder Timothy Chaves</flickrid>

<name>Tim Chaves</name>

</owner>

<size>

<width>1024</width>

<height>1024</height>

<depth>3</depth>

</size>

<segmented>0</segmented>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin1}</xmin>

<ymin>{ymin1}</ymin>

<xmax>{xmax1}</xmax>

<ymax>{ymax1}</ymax>

</bndbox>

</object>

</annotation>""")





template_2box = textwrap.dedent("""\

<annotation>

<folder>VOC2007</folder>

<filename>{filename}.jpg</filename>

<source>

<database>The VOC2007 Database</database>

<annotation>PASCAL VOC2007</annotation>

<image>flickr</image>

<flickrid>336426776</flickrid>

</source>

<owner>

<flickrid>Elder Timothy Chaves</flickrid>

<name>Tim Chaves</name>

</owner>

<size>

<width>1024</width>

<height>1024</height>

<depth>3</depth>

</size>

<segmented>0</segmented>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin1}</xmin>

<ymin>{ymin1}</ymin>

<xmax>{xmax1}</xmax>

<ymax>{ymax1}</ymax>

</bndbox>

</object>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin2}</xmin>

<ymin>{ymin2}</ymin>

<xmax>{xmax2}</xmax>

<ymax>{ymax2}</ymax>

</bndbox>

</object>

</annotation>""")



template_3box = textwrap.dedent("""\

<annotation>

<folder>VOC2007</folder>

<filename>{filename}.jpg</filename>

<source>

<database>The VOC2007 Database</database>

<annotation>PASCAL VOC2007</annotation>

<image>flickr</image>

<flickrid>336426776</flickrid>

</source>

<owner>

<flickrid>Elder Timothy Chaves</flickrid>

<name>Tim Chaves</name>

</owner>

<size>

<width>1024</width>

<height>1024</height>

<depth>3</depth>

</size>

<segmented>0</segmented>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin1}</xmin>

<ymin>{ymin1}</ymin>

<xmax>{xmax1}</xmax>

<ymax>{ymax1}</ymax>

</bndbox>

</object>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin2}</xmin>

<ymin>{ymin2}</ymin>

<xmax>{xmax2}</xmax>

<ymax>{ymax2}</ymax>

</bndbox>

</object>

<object>

<name>opacity</name>

<pose>unknown</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>{xmin3}</xmin>

<ymin>{ymin3}</ymin>

<xmax>{xmax3}</xmax>

<ymax>{ymax3}</ymax>

</bndbox>

</object>

</annotation>""")

list_dict_1box = []



for index, row in df_1box.iterrows():

dict = {

'filename': row['patientId'],

'xmin1': str(row['xmin0']),

'ymin1': str(row['ymin0']),

'xmax1': str(row['xmax0']),

'ymax1': str(row['ymax0']),

}

list_dict_1box.append(dict)



list_dict_2box = []



for index, row in df_2box.iterrows():

dict = {

'filename': row['patientId'],

'xmin1': str(row['xmin0']),

'ymin1': str(row['ymin0']),

'xmax1': str(row['xmax0']),

'ymax1': str(row['ymax0']),

'xmin2': str(row['xmin1']),

'ymin2': str(row['ymin1']),

'xmax2': str(row['xmax1']),

'ymax2': str(row['ymax1']),

}

list_dict_2box.append(dict)



list_dict_3box = []



for index, row in df_3box.iterrows():

dict = {

'filename': row['patientId'],

'xmin1': str(row['xmin0']),

'ymin1': str(row['ymin0']),

'xmax1': str(row['xmax0']),

'ymax1': str(row['ymax0']),

'xmin2': str(row['xmin1']),

'ymin2': str(row['ymin1']),

'xmax2': str(row['xmax1']),

'ymax2': str(row['ymax1']),

'xmin3': str(row['xmin2']),

'ymin3': str(row['ymin2']),

'xmax3': str(row['xmax2']),

'ymax3': str(row['ymax2']),

}

list_dict_3box.append(dict)





# # output all 1-box .xml files

for i in list_dict_1box:

escaped = {k: escape(v) for k, v in i.items()}

data = template_1box.format(**escaped)

open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)



# # output all 2-box .xml files

for i in list_dict_2box:

escaped = {k: escape(v) for k, v in i.items()}

data = template_2box.format(**escaped)

open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)



# output all 3-box .xml files

for i in list_dict_3box:

escaped = {k: escape(v) for k, v in i.items()}

data = template_3box.format(**escaped)

open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)

Now all the annotations will be available in a single directory which can be used to train a detection model using different architectures like ssd mobilenet, yolo etc.,

I have trained the model on 0ml.ai which uses yolov3 framework and the map (mean average precision) achieved is 49.8 . If you want to train a custom model on your own, refer to the official implementation https://pjreddie.com/darknet/yolo/

Click on the 'Inference URL' available in the deployment tab to use this trained model and detect on custom chest radiograph images.

0 comments

Advika Banerjee, CluztersOps1, and Prasad Chaskar like this

Related Listings

Raji Reddy A's other Models Reports

Major Concepts