Raji Reddy A's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Pneumonia Detection

Models Status

Model Overview

Objective:

Given a chest radiograph image, the model should predict a bounding box on the lungs where pneumonia exists. Normal lungs and pneumonia affected lungs can be differentiated by the opacity in the images.

The image of normal lungs appear dark as it is filled with air




If the air is replaced by another substance like fluid or fibrosis, it appears hazy




Dataset

Source: https://www.rsna.org/education/ai-resources-and-training/ai-image-challenge/RSNA-Pneumonia-Detection-Challenge-2018
Torrent: https://academictorrents.com/details/95588a735c9ae4d123f3ca408e56570409bcf2a9

Download the dataset using python3-libtorrent package

Install the required packages

python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install lbry-libtorrent
apt install python3-libtorrent



Download the torrent file from https://academictorrents.com/details/95588a735c9ae4d123f3ca408e56570409bcf2a9 


import libtorrent as lt

ses = lt.session()
ses.listen_on(6881, 6891)
downloads = []

torrent_path = 'kaggle-pneumonia-jpg-95588a735c9ae4d123f3ca408e56570409bcf2a9.torrent' #torrent file path
params = {
"save_path": "/content/Torrent",
"ti": lt.torrent_info(torrent_path),
}
downloads.append(ses.add_torrent(params))


import time
from IPython.display import display
import ipywidgets as widgets

state_str = [
"queued",
"checking",
"downloading metadata",
"downloading",
"finished",
"seeding",
"allocating",
"checking fastresume",
]

layout = widgets.Layout(width="auto")
style = {"description_width": "initial"}
download_bars = [
widgets.FloatSlider(
step=0.01, disabled=True, layout=layout, style=style
)
for _ in downloads
]
display(*download_bars)

while downloads:
next_shift = 0
for index, download in enumerate(downloads[:]):
bar = download_bars[index + next_shift]
if not download.is_seed():
s = download.status()

bar.description = " ".join(
[
download.name(),
str(s.download_rate / 1000),
"kB/s",
state_str[s.state],
]
)
bar.value = s.progress * 100
else:
next_shift -= 1
ses.remove_torrent(download)
downloads.remove(download)
bar.close() # Seems to be not working in Colab (see https://github.com/googlecolab/colabtools/issues/726#issue-486731758)
download_bars.remove(bar)
print(download.name(), "complete")
time.sleep(1)​



Process the data to train a yolov3 model using keras and tensorflow

The data contains images for both normal and pneumonia affected lungs. For this detection model, we will only consider the image of pneumonia affected lungs. In the stage_2_train_labels.csv, the value ‘1’ in Target column indicates that the image has lungs affected with pneumonia


import pandas as pd

df = pd.read_csv('/content/Torrent/kaggle-pneumonia-jpg/stage_2_train_labels.csv')
df = df[df['Target'] == 1]​



The annotations in the csv are in the format of x, y, width, height. They can be converted to xmin, ymin, xmax, ymax format in the following way

df['xmin'] = df['x']
df['ymin'] = df['y']
df['xmax'] = df['x'] + df['width']
df['ymax'] = df['y'] + df['height']
df = df[['patientId', 'xmin', 'ymin', 'xmax', 'ymax']]
patientId_order = df['patientId']
s = df.groupby(['patientId']).cumcount()
df1 = df.set_index(['patientId', s]).unstack().sort_index(level=1, axis=1)
df1.columns = [f'{x}{y}' for x, y in df1.columns]
df1 = df1.reset_index()

df1 = df1.set_index('patientId')
df1.loc[list(patientId_order.unique())]

df1.to_csv('/content/pneumonia_detection/annotations.csv')​



Generate XML annotation files

The images are annotated with 1 or 2 or 3 or 4 bounding boxes based on the pneumonia regions present. The below code ignores the images with 4 bounding boxes as there are only few such images


import numpy as np
import pandas as pd
import textwrap
from xml.sax.saxutils import escape

df = pd.read_csv('/content/pneumonia_detection/annotations.csv')

# create df for entries with only 1 bounding box
df_1box = df[df.xmin1.isnull()]
df_1box = df_1box.drop(['xmin1', 'ymin1',
'xmax1', 'ymax1', 'xmin2', 'ymin2', 'xmax2', 'ymax2', 'xmin3',
'ymin3', 'xmax3', 'ymax3'], axis=1)
df_1box.info()

# create df for entries with 2 bounding boxes
df_2box = df[df.xmin2.isnull() & df.xmin1.notnull()]
df_2box = df_2box.drop(['xmin2', 'ymin2', 'xmax2', 'ymax2', 'xmin3',
'ymin3', 'xmax3', 'ymax3'], axis=1)
df_2box.loc[:, 'xmin0':] = df_2box.loc[:, 'xmin0':].astype(int)
# print(df_2box.head(10))
print(df_2box.info())

# create df for entries with 3 bounding boxes
df_3box = df[df.xmin3.isnull() & df.xmin2.notnull()]
df_3box = df_3box.drop(['xmin3', 'ymin3', 'xmax3', 'ymax3'], axis=1)
df_3box.loc[:, 'xmin0':] = df_3box.loc[:, 'xmin0':].astype(int)
print(df_3box.head())​




Create XML templates

template_1box = textwrap.dedent("""\
<annotation>
<folder>VOC2007</folder>
<filename>{filename}.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>336426776</flickrid>
</source>
<owner>
<flickrid>Elder Timothy Chaves</flickrid>
<name>Tim Chaves</name>
</owner>
<size>
<width>1024</width>
<height>1024</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin1}</xmin>
<ymin>{ymin1}</ymin>
<xmax>{xmax1}</xmax>
<ymax>{ymax1}</ymax>
</bndbox>
</object>
</annotation>""")


template_2box = textwrap.dedent("""\
<annotation>
<folder>VOC2007</folder>
<filename>{filename}.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>336426776</flickrid>
</source>
<owner>
<flickrid>Elder Timothy Chaves</flickrid>
<name>Tim Chaves</name>
</owner>
<size>
<width>1024</width>
<height>1024</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin1}</xmin>
<ymin>{ymin1}</ymin>
<xmax>{xmax1}</xmax>
<ymax>{ymax1}</ymax>
</bndbox>
</object>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin2}</xmin>
<ymin>{ymin2}</ymin>
<xmax>{xmax2}</xmax>
<ymax>{ymax2}</ymax>
</bndbox>
</object>
</annotation>""")

template_3box = textwrap.dedent("""\
<annotation>
<folder>VOC2007</folder>
<filename>{filename}.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>336426776</flickrid>
</source>
<owner>
<flickrid>Elder Timothy Chaves</flickrid>
<name>Tim Chaves</name>
</owner>
<size>
<width>1024</width>
<height>1024</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin1}</xmin>
<ymin>{ymin1}</ymin>
<xmax>{xmax1}</xmax>
<ymax>{ymax1}</ymax>
</bndbox>
</object>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin2}</xmin>
<ymin>{ymin2}</ymin>
<xmax>{xmax2}</xmax>
<ymax>{ymax2}</ymax>
</bndbox>
</object>
<object>
<name>opacity</name>
<pose>unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>{xmin3}</xmin>
<ymin>{ymin3}</ymin>
<xmax>{xmax3}</xmax>
<ymax>{ymax3}</ymax>
</bndbox>
</object>
</annotation>""")​




list_dict_1box = []

for index, row in df_1box.iterrows():
dict = {
'filename': row['patientId'],
'xmin1': str(row['xmin0']),
'ymin1': str(row['ymin0']),
'xmax1': str(row['xmax0']),
'ymax1': str(row['ymax0']),
}
list_dict_1box.append(dict)

list_dict_2box = []

for index, row in df_2box.iterrows():
dict = {
'filename': row['patientId'],
'xmin1': str(row['xmin0']),
'ymin1': str(row['ymin0']),
'xmax1': str(row['xmax0']),
'ymax1': str(row['ymax0']),
'xmin2': str(row['xmin1']),
'ymin2': str(row['ymin1']),
'xmax2': str(row['xmax1']),
'ymax2': str(row['ymax1']),
}
list_dict_2box.append(dict)

list_dict_3box = []

for index, row in df_3box.iterrows():
dict = {
'filename': row['patientId'],
'xmin1': str(row['xmin0']),
'ymin1': str(row['ymin0']),
'xmax1': str(row['xmax0']),
'ymax1': str(row['ymax0']),
'xmin2': str(row['xmin1']),
'ymin2': str(row['ymin1']),
'xmax2': str(row['xmax1']),
'ymax2': str(row['ymax1']),
'xmin3': str(row['xmin2']),
'ymin3': str(row['ymin2']),
'xmax3': str(row['xmax2']),
'ymax3': str(row['ymax2']),
}
list_dict_3box.append(dict)


# # output all 1-box .xml files
for i in list_dict_1box:
escaped = {k: escape(v) for k, v in i.items()}
data = template_1box.format(**escaped)
open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)

# # output all 2-box .xml files
for i in list_dict_2box:
escaped = {k: escape(v) for k, v in i.items()}
data = template_2box.format(**escaped)
open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)

# output all 3-box .xml files
for i in list_dict_3box:
escaped = {k: escape(v) for k, v in i.items()}
data = template_3box.format(**escaped)
open("/content/pneumonia_detection/annotations/{}.xml".format(i['filename']), "w").write(data)​


Now all the annotations will be available in a single directory which can be used to train a detection model using different architectures like ssd mobilenet, yolo etc.,



I have trained the model on 0ml.ai which uses yolov3 framework and the map (mean average precision) achieved is 49.8 . If you want to train a custom model on your own, refer to the official implementation https://pjreddie.com/darknet/yolo/

Click on the 'Inference URL' available in the deployment tab to use this trained model and detect on custom chest radiograph images.

0 comments