Introduction :
Major types of blood cells include red blood cells (erythrocytes) ,white blood cells (leukocytes), and platelets (thrombocytes). Together, these three kinds of blood cells add up to a total 45% of the blood tissue by volume, with the remaining 55% of the volume composed of plasma, the liquid component of blood.A blood cell disorder is a condition in which there’s a problem with your red blood cells, white blood cells, or the smaller circulating cells called platelets, which are critical for clot formation.Problem Statement :
The diagnosis of blood-based diseases often involves identifying and characterizing patient blood samples. Automated methods to detect and classify blood cell subtypes have important medical applications.
About Dataset :
This dataset contains 12,500 augmented images of blood cells (JPEG) with accompanying cell type labels (CSV). There are approximately 3,000 images for each of 4 different cell types grouped into 4 different folders (according to cell type). The cell types are Eosinophil, Lymphocyte, Monocyte, and Neutrophil.
Required Libraries :
1. matplotlib2. numpy3. tensorflow4. kerasLets look at the codeLoad Data :
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/content/TRAIN",
seed=123,
shuffle=True,
image_size=(IMAGE_SIZE1,IMAGE_SIZE2),
batch_size=BATCH_SIZE
)
Plot some images from our dataset :
plt.figure(figsize=(16,10))
for img,lbl in dataset.take(1):
for i in range(12):
ax = plt.subplot(3,4,i+1)
plt.imshow(img[i].numpy().astype('uint8'))
plt.title(class_names[lbl[i]],{'fontsize':20})
plt.axis("off")
Split Data into three category training, testing and validation data.
def get_dataset_partitions_tf(ds, train_split=0.7, val_split=0.1, test_split=0.2, shuffle=True, shuffle_size=10000):
ds_size = len(ds)
if shuffle:
ds = ds.shuffle(shuffle_size, seed=12)
train_size = int(train_split * ds_size)
val_size = int(val_split * ds_size)
train_ds = ds.take(train_size)
val_ds = ds.skip(train_size).take(val_size)
test_ds = ds.skip(train_size).skip(val_size)
return train_ds, val_ds, test_ds
train_data,valid_data,test_data = get_dataset_partitions_tf(dataset)
train_ds = train_data.cache().shuffle(1000).prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = valid_data.cache().shuffle(1000).prefetch(buffer_size=tf.data.AUTOTUNE)
test_ds = test_data.cache().shuffle(1000).prefetch(buffer_size=tf.data.AUTOTUNE)
Scaling :
resize_and_rescale = tf.keras.Sequential([
layers.experimental.preprocessing.Rescaling(1./255),
])
Data Augmentation :
data_augmentation = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
layers.experimental.preprocessing.RandomRotation(0.2),
])
train_ds = train_ds.map(
lambda x, y: (data_augmentation(x, training=True), y)
).prefetch(buffer_size=tf.data.AUTOTUNE)
Model Creation :
input_shape = (BATCH_SIZE, IMAGE_SIZE1, IMAGE_SIZE2, CHANNELS)
n_classes = 4
model = models.Sequential([
resize_and_rescale,
layers.Conv2D(32, kernel_size = (3,3), activation='relu', input_shape=input_shape),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, kernel_size = (3,3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, kernel_size = (3,3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(n_classes, activation='softmax'),
])
model.build(input_shape=input_shape)
model.compile(
optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy']
)
Train Model :
history = model.fit(
train_ds,
batch_size=BATCH_SIZE,
validation_data=val_ds,
epochs=20
)
Accuracy Plots for model :
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Model Evaluation :Class 0 : EosinophilClass 1 : LymphocyteClass 2 : MonocyteClass 3 : Neutrophil
class_names = [0,1,2,3]
y_pred = list()
for img,labl in test_data:
for i in range(len(test_data)):
img_array = tf.keras.preprocessing.image.img_to_array(img[i].numpy())
img_array = tf.expand_dims(img_array, 0)
predictions = model.predict(img_array)
predicted_class = class_names[np.argmax(predictions[0])]
y_pred.append(predicted_class)
y_true = list()
for img,labl in test_data:
for i in range(len(test_data)):
y_true.append(labl[i].numpy())
Classification Report :
from sklearn.metrics import classification_report
print(classification_report(y_true,y_pred))
Confusion Matrix :
from sklearn.metrics import confusion_matrix
import seaborn as sns
import pandas as pd
class_names = [0,1,2,3]
fig,ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks,class_names)
plt.yticks(tick_marks,class_names)
cnf_matrix = confusion_matrix(y_true,y_pred)
sns.heatmap(pd.DataFrame(cnf_matrix), annot = True, cmap = 'Blues',
fmt = 'g')
ax.xaxis.set_label_position('top')
plt.tight_layout()
plt.title(f'Confusion Matrix', {'fontsize':20})
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.show()
Save Model :
model.save('model.h5')