Vaibhav Mali

Related Listings

Pneumonia Detection u...

0 comments, 1 review , 2 likes
Baseball Sports Analysis

0 comments, 1 review , 1 like

Heart Failure Prediction

0 comments, 0 reviews , 621 views, 0 likes
Covid19 Classification Using Machine Learning

0 comments, 0 reviews , 652 views, 0 likes

Major Concepts

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Predicting Fetal Health With Cardiotocography Data

Predicting Fetal Health With Cardiotocography Data

Models Status

Model Overview

Reduction of child mortality is reflected in several of the United Nations' Sustainable Development Goals and is a key indicator of human progress.
The UN expects that by 2030, countries end preventable deaths of newborns and children under 5 years of age, with all countries aiming to reduce under‑5 mortality to at least as low as 25 per 1,000 live births.

Parallel to notion of child mortality is of course maternal mortality, which accounts for 295 000 deaths during and following pregnancy and childbirth (as of 2017). The vast majority of these deaths (94%) occurred in low-resource settings, and most could have been prevented.

In light of what was mentioned above, Cardiotocograms (CTGs) are a simple and cost accessible option to assess fetal health, allowing healthcare professionals to take action in order to prevent child and maternal mortality. The equipment itself works by sending ultrasound pulses and reading its response, thus shedding light on fetal heart rate (FHR), fetal movements, uterine contractions and more.

Dataset Information

2126 fetal cardiotocograms (CTG) were automatically processed and the respective diagnostic features measured. The CTG were also classified by three expert obstetricians and a consensus classification label assigned to each of them. Classification was both with respect to a morphologic pattern (A, B, C. ...) and to a fetal state (N, S, P). Therefore the dataset can be used either for 10-class or 3-class experiments.

Dataset Information

Inputs
The dataset contains a total of 21 inputs below described:

FHR baseline (beats per minute);

number of accelerations per second;

number of fetal movements per second;

number of uterine contractions per second;

number of light decelerations per second;

number of severe decelerations per second;

number of prolongued decelerations per second;

percentage of time with abnormal short term variability;

mean value of short term variability;

percentage of time with abnormal long term variability;

mean value of long term variability;

width of FHR histogram;

minimum of FHR histogram;

maximum of FHR histogram;

number of histogram peaks;

number of histogram zeros;

histogram mode;

histogram mean;

histogram median;

histogram variance; and

histogram tendency.

Target Variable

This notebook uses the fetal state as the target variable. As above mentioned, fetal state is classified according to 3 situations (N — Normal, S — Suspect or P — Pathologic).

1 Means Normal
2 Means Suspect &
3 Means Pathologic

Here are the dataset link:

https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification

1. IMPORT NECESSARY PYTHON LIBRARY

import pickle



import pandas as pd



import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics import explained_variance_score, r2_score, classification_report



from sklearn.preprocessing import StandardScaler  # Normalize the data

from sklearn.model_selection import train_test_split  # Split the data



from sklearn.linear_model import LogisticRegression

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.neighbors import KNeighborsClassifier

from sklearn.naive_bayes import GaussianNB

from sklearn.tree import DecisionTreeClassifier

from sklearn.svm import SVC



from time import time



# Measure the efficiency of the model

from sklearn.metrics import mean_absolute_error

2. READ DATA

So, now, we can get data from our dataset, using Pandas fuction read_csv(), because our data was in .csv format. Function head() returns top 5 records from your dataset, here it is used just to check that we read our data correctly.

df = pd.read_csv("fetal_health.csv")

print(df.sample(5))

So, as we can see, we have 22 columns here (21 columns are our input data and the last one column will be used as prediction column). Also, it has 2126 rows, it is 2126 measurements extracted from cardiotocograms and classified by expert obstetricians into 3 categories:

Normal

Suspect

Pathological

After that, we can print all our dataset columns for future use:

cols = df.columns

print(cols)

3. ANALYZE DATA

One of the most important things is to understand data which you work with. Here we will use some well-known methods for easier understanding of our data.

For all dataset columns we will find some statistical information, like: Mean, Median, Mode (a.k.a The Three M's of Statistics), Standard Deviation and Correlation using Pandas functions.

3.1. mean

In the code below we used mean() Pandas function with axis=0 parameter. It is the the axis to iterate over while searching. It means that you want to find mean across all your indexes (in our case indexes are the names of the colunms). So, it finds statistical data from up to down, via all rows for a column. If you will write axis=1 you will find statistical data from left to right, via all columns, for a row.

mean = df.mean(axis=0)

print(mean)

3.2. median

Here we will use median() Pandas function.

median = df.median(axis=0)

print(median)

3.3. mode

Here we will use mode() Pandas function.

mode = df.mode(axis=0)

print(mode)

3.4. correlation

Pandas corrwith() is used to compute pairwise correlation between rows or columns of two DataFrame objects.

correlation = df.corr().round(2)

plt.figure(figsize=(14, 7))

sns.heatmap(correlation, annot=True, cmap='coolwarm')

plt.show()

sns.set_style('white')

sns.set_palette('coolwarm')

plt.figure(figsize=(13, 6))

plt.title('Distribution of correlation of features')

abs(correlation['fetal_health']).sort_values()[:-1].plot.barh()

plt.show()

4. VISUALIZE DATA

For easier understanding data we can build some plots. In this example, we will check how many records we have in each class. For this task we can use built-in plots to Pandas library. Here you can read more about them.

Firstly, we set the size of our picture for our consideration, set the title of the plot and name the OX and OY axises.

Then, we should count total sum of records for each class, in our case, classes are named as 1.0, 2.0, 3.0 via value_counts(). For exact counts for each class we can easily print our result to the screen.

Build a bar chart for these classes.

Turn on the grid via grid() and show our plot via show()

#STEP-1

plt.figure(figsize=(18,5))

plt.title('FETAL HEALTH CLASSES')

plt.xlabel('Fetal health class')

plt.ylabel('count')



#STEP-2

value_counts = data["fetal_health"].value_counts()

print(value_counts) 



#STEP-3

value_counts.plot.bar()



#STEP-4

plt.grid()

plt.show()

fig, ax = plt.subplots(figsize=(14, 6))

sns.kdeplot(df["baseline value"], alpha=0.5, shade=True, ax=ax, hue=df['fetal_health'], palette="coolwarm")

plt.title('Average Heart Rate Distribution', fontsize=18)

ax.set_xlabel("FHR")

ax.set_ylabel("Frequency")



ax.legend(['Pathological', 'Suspect', 'Normal'])



plt.show()

fig, ax = plt.subplots(figsize=(14, 6))

sns.kdeplot(df["accelerations"], alpha=0.5, shade=True, ax=ax, hue=df['fetal_health'], palette="coolwarm")

plt.title('The Relationship of Acceleration With the Health of the Fetus', fontsize=18)

ax.set_xlabel("Accelerations")

ax.set_ylabel("Frequency")



ax.legend(['Pathological', 'Suspect', 'Normal'])



plt.show()

fig, ax = plt.subplots(figsize=(14, 6))

sns.kdeplot(df["uterine_contractions"], alpha=0.5, shade=True, ax=ax, hue=df['fetal_health'], palette="coolwarm")

plt.title('The Relationship of Uterine Contractions With the Health of the Fetus', fontsize=18)

ax.set_xlabel("Uterine Contractions")

ax.set_ylabel("Frequency")



ax.legend(['Pathological', 'Suspect', 'Normal'])



plt.show()

5. SPLIT DATASET INTO TRAIN AND TEST DATA

So, here we will divide our dataset into two parts (for model training and validation). Let it will be 70%:30% respectively. You can also try it with another ratio, like 60%:40%, 80%:20%, 90%:10% and so on.

# Select Features

X = df.drop(columns=['fetal_health'], axis=1)



# Select Target

y = df['fetal_health']



# Set Training and Testing Data

X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size=.2, random_state=44)



print('Shape of training feature:', X_train.shape)

print('Shape of testing feature:', X_test.shape)

print('Shape of training label:', y_train.shape)

print('Shape of testing label:', y_test.shape)

6. NORMALIZE DATA

It is necessary to normalize data before making some classification tasks. Here we will normalize it in a range from 0 to 1, it is one of the most popular normalization intervals. So, we will initialize it.

Output column "fetal_health" is the last column and it shouldn't be normalized, because it has target values. So, we will remember and save this all column values in temporary variable.

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)



pickle.dump(scaler, open('scaler.pkl', 'wb'))

Use Different Model To Find Out Various Accuracy And Get Best Model:-

def evaluate_model(model, x_test, y_test):

    from sklearn import metrics



    # Predict Test Data

    y_pred = model.predict(x_test)



    # Calculate accuracy, precision, recall, f1-score, and kappa score

    acc = metrics.accuracy_score(y_test, y_pred)

    prec = metrics.precision_score(y_test, y_pred, average='macro')

    rec = metrics.recall_score(y_test, y_pred, average='macro')

    f1 = metrics.f1_score(y_test, y_pred, average='macro')



    # Display confussion matrix

    cm = metrics.confusion_matrix(y_test, y_pred)



    return {'acc': acc, 'prec': prec, 'rec': rec, 'f1': f1, 'cm': cm}

regressors = [

    LogisticRegression(),

    LinearDiscriminantAnalysis(),

    KNeighborsClassifier(),

    GaussianNB(),

    DecisionTreeClassifier(),

    SVC(),

]



head = 10

for model in regressors[:head]:

    start = time()

    model.fit(X_train, y_train)

    train_time = time() - start

    start = time()

    y_pred = model.predict(X_test)

    predict_time = time() - start

    print(model)

    print("\tTraining time: %0.3fs" % train_time)

    print("\tPrediction time: %0.3fs" % predict_time)

    print("\tExplained variance:", explained_variance_score(y_test, y_pred))

    print("\tMean absolute error:", mean_absolute_error(y_test, y_pred))

    print("\tR2 score:", r2_score(y_test, y_pred))

    print()



svc = SVC()

svc.fit(X_train, y_train)



svc_evaluate = evaluate_model(svc, X_test, y_test)



container = pd.DataFrame(pd.Series(

    {'Accuracy': svc_evaluate['acc'], 'Precision': svc_evaluate['prec'], 'Recall': svc_evaluate['rec'],

     'F1 Score': svc_evaluate['f1']}, name='Result'))

print(container)



sns.heatmap(svc_evaluate['cm'], annot=True, cmap='coolwarm', cbar=False, linewidths=3, linecolor='w',

            xticklabels=['a', 'b', 'c'])

plt.title('Confusion Matrix', fontsize=16)

plt.show()

From above result we see that support vector machine works better as compared to other so we use support vector machine with some hypertuning

from sklearn.model_selection import GridSearchCV



# defining parameter range

param_grid = {'C': [0.1, 1, 10, 100, 1000],

              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],

              'kernel': ['rbf']}



grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)



# fitting the model for grid search

grid.fit(X_train, y_train)



# print best parameter after tuning

print(grid.best_params_)



# print how our model looks after hyper-parameter tuning

print(grid.best_estimator_)



grid_predictions = grid.predict(X_test)



# print classification report

print(classification_report(y_test, grid_predictions))



pickle.dump(grid, open('grid.pkl', 'wb'))

So We finally used Support Vector Machine Which give us accuracy of 95% and f1 score 97%.

0 comments

Related Listings

Vaibhav Mali's other Models Reports

Major Concepts

Predicting Fetal Health With Cardiotocography Data

Models Status

Model Overview

Inputs
The dataset contains a total of 21 inputs below described:

Target Variable

6. NORMALIZE DATA

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Vaibhav Mali's other Models Reports

Major Concepts

Predicting Fetal Health With Cardiotocography Data

Models Status

Model Overview

InputsThe dataset contains a total of 21 inputs below described:

Target Variable

6. NORMALIZE DATA

Deployment

Photos

Reviews

Connect With Us

Inputs
The dataset contains a total of 21 inputs below described: