Vaibhav Mali

Related Listings

Retinal OCT Image Cla...

0 comments, 2 reviews , 2 likes
Malaria Detection

0 comments, 2 reviews , 2 likes

Covid19 Classification Using Machine Learning

0 comments, 0 reviews , 652 views, 0 likes
Predicting Fetal Health With Cardiotocography Data

0 comments, 0 reviews , 641 views, 0 likes

Major Concepts

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Heart Failure Prediction

Heart Failure Prediction

Models Status

Model Overview

The survival prediction of heart failure patients by utilizing their clinical records and laboratory test results. Forecasting heart-failure related events in clinical practice tend to be quite inaccurate and highly variable. Identifying the key drivers of heart failure is also clinically very important. In this regard, we develop a model to accurately identify the patients who are at risk utilizing various machine learning techniques.

Columns description:-

anaemia:Decrease of red blood cells or hemoglobin (boolean)

creatinine_phosphokinase:Level of the CPK enzyme in the blood (mcg/L)

diabetes:If the patient has diabetes (boolean)

ejection_fraction:Ejection fraction (EF) is a measurement, expressed as a percentage, of how much blood the left ventricle pumps out with each contraction

high_blood_pressure:blood hypertension

platelets:are a component of blood whose function (along with the coagulation factors)

serum_creatinine:Serum creatinine is widely interpreted as a measure only of renal function

serum_sodium: to see how much sodium is in your blood it is particularly important for nerve and muscle function.

1. import library

# manipulation data

import pickle



import pandas as pd

import numpy as np



#visualiation data

import matplotlib.pyplot as plt

import seaborn as sns

import matplotlib

import plotly.graph_objects as go

import plotly.express as px



# Model evaluation

from sklearn.metrics import roc_auc_score

from sklearn.metrics import precision_score

from sklearn.metrics import recall_score

from sklearn.metrics import f1_score



#default theme

sns.set(context='notebook', style='darkgrid', palette='colorblind', font='sans-serif', font_scale=1, rc=None)

matplotlib.rcParams['figure.figsize'] =[8,8]

matplotlib.rcParams.update({'font.size': 15})

matplotlib.rcParams['font.family'] = 'sans-serif'

2. data analysis

train = pd.read_csv('heart_failure_clinical_records_dataset.csv')

train.head(6)

train.info()

train.dtypes.value_counts().plot.pie(explode=[0.1,0.1],autopct='%1.1f%%',shadow=True)

plt.title('type of our data')

train.describe()

3. finding missing values

train.isnull().sum()

4. visualization

train.hist(figsize=(15,15),edgecolor='black')

death events

train.DEATH_EVENT.value_counts().plot.pie(explode=[0.1,0.1],autopct='%1.1f%%',shadow=True)

plt.title('the % of deaths')

Age

plt.figure(figsize=(20,6))

sns.countplot(x='age',data=train)

plt.xticks(rotation=90)

plt.title('the ages of our persone')

Distribution Of Age

fig = go.Figure()

fig.add_trace(go.Histogram(

    x = train['age'],

    xbins=dict( # bins used for histogram

        start=40,

        end=95,

        size=2

    ),

    marker_color='#e8ab60',

    opacity=1

))



fig.update_layout(

    title_text='Distribution of Age',

    xaxis_title_text='AGE',

    yaxis_title_text='COUNT', 

    bargap=0.05, # gap between bars of adjacent location coordinates

    xaxis =  {'showgrid': False },

    yaxis = {'showgrid': False },

    template = 'presentation'

)



fig.show()

Distribution of AGE Vs DEATH_EVENT

fig = px.histogram(train, x="age", color="DEATH_EVENT", marginal="violin", hover_data=train.columns, 

                   title ="Distribution of AGE Vs DEATH_EVENT", 

                   labels={"age": "AGE"},

                   template="plotly",)

fig.show()

ejection_fraction

sns.boxplot(x = train.ejection_fraction, color = 'green')

plt.show()

We can see there are two outliers. Lets remove them (70 and 80)

train[train['ejection_fraction']>=70]

train = train[train['ejection_fraction']<70]

import plotly.graph_objects as go



fig = go.Figure()

fig.add_trace(go.Histogram(

    x = train['ejection_fraction'],

    xbins=dict( # bins used for histogram

        start=14,

        end=80,

        size=2

    ),

    marker_color='#A7F432',

    opacity=1

))



fig.update_layout(

    title_text='EJECTION FRACTION DISTRIBUTION',

    xaxis_title_text='EJECTION FRACTION',

    yaxis_title_text='COUNT', 

    bargap=0.05, # gap between bars of adjacent location coordinates



    template = 'plotly_dark'

)



fig.show()

features selection

sns.boxplot(x=train.time, color = 'yellow')

plt.show()

No outliers in time

ejection_fraction

sns.boxplot(x=train.serum_creatinine, color = 'red')

plt.show()

Before dealing with outliers we require knowledge about the outlier, the dataset and possibly some domain knowledge.



Removing outliers without a good reason will not always increase accuracy. Without a deep understanding of what are the possible ranges that

exist within each feature, removing outliers becomes tricky.



When I researched a bit I found that all the values in serum_creatinine falls in possible range of values. So they are not outliers. 

They are actual data points that helps in predicting DEATH_EVENT.

feature selection

train.corr().style.background_gradient(cmap='coolwarm').set_precision(2)

# Feature Selection



plt.rcParams['figure.figsize']=15,6 

sns.set_style("darkgrid")



x = train.iloc[:, :-1]

y = train.iloc[:,-1]



from sklearn.ensemble import ExtraTreesClassifier



model = ExtraTreesClassifier()

model.fit(x,y)

print(model.feature_importances_) 

feat_importances = pd.Series(model.feature_importances_, index=x.columns)

feat_importances.nlargest(12).plot(kind='barh')

plt.show()

like we can c that some of our feature had a corrolation almost aqual to 0 so we gonna drop them like :

anaemia

creatinine_phosphokinase

diabetes

high_blood_pressure

platelets

smoking

We will select only 3 features : time, ejection_fraction, serum_creatinine

train=train.drop(['anaemia','creatinine_phosphokinase','diabetes','high_blood_pressure','platelets','sex','smoking','age'],axis=1)



print(train)

train.corr().style.background_gradient(cmap='coolwarm').set_precision(3)

split data

from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.metrics import accuracy_score





x=train.drop('DEATH_EVENT',axis=1)

y=train.DEATH_EVENT





print(x.shape)

print(y.shape)





x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)





print(x_train)

print(y_test)

Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

x_train = sc.fit_transform(x_train)

x_test = sc.transform(x_test)



pickle.dump(sc, open('sc.pkl','wb'))

Logistic Regression

# Making Confusion Matrix and calculating accuracy score

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import confusion_matrix, accuracy_score



model = LogisticRegression()



#Fit the model

model.fit(x_train, y_train)

y_pred = model.predict(x_test)



mylist = []

# Confusion Matrix

cm = confusion_matrix(y_test, y_pred)

# accuracy score

acc_logreg = accuracy_score(y_test, y_pred)



mylist.append(acc_logreg)

print(cm)

print(acc_logreg)



# Evaluation metrics



acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

KNN

# Finding the optimum number of neighbors 



from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix, accuracy_score



list1 = []

for neighbors in range(3,10):

    classifier = KNeighborsClassifier(n_neighbors=neighbors, metric='minkowski')

    classifier.fit(x_train, y_train)

    y_pred = classifier.predict(x_test)

    list1.append(accuracy_score(y_test,y_pred))

plt.plot(list(range(3,10)), list1)

plt.show()





# Training the K Nearest Neighbor Classifier on the Training set



classifier = KNeighborsClassifier(n_neighbors=5)

classifier.fit(x_train, y_train)



# Predicting the Test set results



y_pred = classifier.predict(x_test)

print(y_pred)



# Making the confusion matrix and calculating accuracy score



from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

acc_knn = accuracy_score(y_test, y_pred)

mylist.append(acc_knn)

print(cm)

print(acc_knn)



# Evaluation metrics

acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

Support Vector Machines

from sklearn.svm import SVC

from sklearn.metrics import confusion_matrix, accuracy_score

list1 = []

for c in [0.5,0.6,0.7,0.8,0.9,1.0]:

    classifier = SVC(C = c, random_state=0, kernel = 'rbf')

    classifier.fit(x_train, y_train)

    y_pred = classifier.predict(x_test)

    list1.append(accuracy_score(y_test,y_pred))

plt.plot([0.5,0.6,0.7,0.8,0.9,1.0], list1)

plt.show()



# Training the Support Vector Classifier on the Training set



from sklearn.svm import SVC

classifier = SVC(C = 0.7, random_state=0, kernel = 'rbf')

classifier.fit(x_train, y_train)



# Predicting the test set results



y_pred = classifier.predict(x_test)

print(y_pred)



# Making the confusion matrix and calculating accuracy score



from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

acc_svc = accuracy_score(y_test, y_pred)

print(cm)

print(acc_svc)

mylist.append(acc_svc)



# Evaluation metrics

acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

Decision Tree Classifier

# Finding the optimum number of max_leaf_nodes



from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import confusion_matrix, accuracy_score

list1 = []

for leaves in range(2,15):

    classifier = DecisionTreeClassifier(max_leaf_nodes = leaves, random_state=0, criterion='entropy')

    classifier.fit(x_train, y_train)

    y_pred = classifier.predict(x_test)

    list1.append(accuracy_score(y_test,y_pred))

#print(mylist)

plt.plot(list(range(2,15)), list1)

plt.show()



# Training the Decision Tree Classifier on the Training set



classifier = DecisionTreeClassifier(max_leaf_nodes = 10, random_state=0, criterion='entropy')

classifier.fit(x_train, y_train)



# Predicting the test set results



y_pred = classifier.predict(x_test)

print(y_pred)



# Making the confusion matrix and calculating accuracy score



from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

acc_decisiontree = accuracy_score(y_test, y_pred)

print(cm)

print(acc_decisiontree)

mylist.append(acc_decisiontree)



# Evaluation metrics

acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

RANDOM FOREST CLASSIFCATION

#Finding the optimum number of n_estimators



from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import confusion_matrix, accuracy_score

list1 = []

for estimators in range(10,30):

    classifier = RandomForestClassifier(n_estimators = estimators, random_state=0, criterion='entropy')

    classifier.fit(x_train, y_train)

    y_pred = classifier.predict(x_test)

    list1.append(accuracy_score(y_test,y_pred))

#print(mylist)

plt.plot(list(range(10,30)), list1)

plt.show()



# Training the RandomForest Classifier on the Training set



from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 15, criterion='entropy', random_state=0)

classifier.fit(x_train,y_train)



# Predicting the test set results



y_pred = classifier.predict(x_test)

print(y_pred)



# Making the confusion matrix and calculating the accuracy score



from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

acc_randomforest = accuracy_score(y_test, y_pred)

mylist.append(acc_randomforest)

print(cm)

print(acc_randomforest)



# Evaluation metrics

acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

ANN (Artificial Neural Network)

np.random.seed(0)

import tensorflow as tf



# Initialising the ANN



ann = tf.keras.models.Sequential()



# Adding the input layer and the first hidden layer



ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu'))



# Adding the second hidden layer



ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu'))



# Adding the third hidden layer



ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu'))



# Adding the fourth hidden layer



ann.add(tf.keras.layers.Dense(units = 7, activation = 'relu'))



# Adding the output layer



ann.add(tf.keras.layers.Dense(units = 1, activation = 'sigmoid'))



# Compiling the ANN



ann.compile(optimizer = 'adam', loss = 'binary_crossentropy' , metrics = ['accuracy'] )



# Training the ANN on the training set



ann.fit(x_train, y_train, batch_size = 16, epochs = 100)



# Predicting the test set results



y_pred = ann.predict(x_test)

y_pred = (y_pred > 0.5)

np.set_printoptions()



# Making the confusion matrix, calculating accuracy_score 



from sklearn.metrics import confusion_matrix, accuracy_score



# confusion matrix

cm = confusion_matrix(y_test,y_pred)

print("Confusion Matrix")

print(cm)

print()



# accuracy

ac_ann = accuracy_score(y_test,y_pred)

print("Accuracy")

print(ac_ann)

mylist.append(ac_ann)



# Evaluation metrics

acc = accuracy_score(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)



print(pd.Series({"Accuracy": acc,

                 "ROC-AUC": roc_auc,

                 "Precision": precision,

                 "Recall": recall,

                 "F1-score": f1}).to_string())

Checking For The Accuracy Score

models = pd.DataFrame({

    'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression', 

              'Random Forest', 'ANN',   

              'Decision Tree','xgboost','catboost'],

    'Score': [acc_svc, acc_knn, acc_logreg, 

              acc_randomforest, ac_ann, acc_decisiontree,ac_xgboost,ac_catboost

              ]})

models.sort_values(by='Score', ascending=False)

Accuracy Of Different Classifier Models

plt.rcParams['figure.figsize']=15,6

sns.set_style("darkgrid")

ax = sns.barplot(x=models.Model, y=models.Score, palette = "rocket", saturation =1.5)

plt.xlabel("Classifier Models", fontsize = 20 )

plt.ylabel("% of Accuracy", fontsize = 20)

plt.title("Accuracy of different Classifier Models", fontsize = 20)

plt.xticks(fontsize = 12, horizontalalignment = 'center', rotation = 8)

plt.yticks(fontsize = 13)

for p in ax.patches:

    width, height = p.get_width(), p.get_height()

    x, y = p.get_xy()

    ax.annotate(f'{height:.2%}', (x + width/2, y + height*1.02), ha='center', fontsize = 'x-large')

plt.show()

0 comments

Related Listings

Vaibhav Mali's other Models Reports

Major Concepts

Heart Failure Prediction

Models Status

Model Overview

1. import library

2. data analysis

3. finding missing values

4. visualization

death events

Age

ejection_fraction

features selection

feature selection

like we can c that some of our feature had a corrolation almost aqual to 0 so we gonna drop them like :

split data

Feature Scaling

Logistic Regression

KNN

Support Vector Machines

Decision Tree Classifier

RANDOM FOREST CLASSIFCATION

ANN (Artificial Neural Network)

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Vaibhav Mali's other Models Reports

Major Concepts

Heart Failure Prediction

Models Status

Model Overview

1. import library

2. data analysis

3. finding missing values

4. visualization

death events

Age

ejection_fraction

features selection

feature selection

like we can c that some of our feature had a corrolation almost aqual to 0 so we gonna drop them like :

split data

Feature Scaling

Logistic Regression

KNN

Support Vector Machines

Decision Tree Classifier

RANDOM FOREST CLASSIFCATION

ANN (Artificial Neural Network)

Deployment

Photos

Reviews

Connect With Us