Sasidhar Reddy's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Generic Models » Predictive Modelling » Cardiovascular Disease Detection using Machine Learning

Cardiovascular Disease Detection using Machine Learning

Models Status

Model Overview

Heart disease is a fatal human disease, rapidly increases globally in both developed
and undeveloped countries and consequently causes death.
In this use case, the machine learning model predicts if a person will be affected by
cardiovascular disease or not.

Problem statement:

Heart disease is a fatal human disease, rapidly increases globally in both developed
and undeveloped countries and consequently causes death.
In this use case, the machine learning model predicts if a person will be affected by
cardiovascular disease or not.
This gives a person inference about their health condition, and a person may take
precautions.

Usage domains and advantages:
The medical industry can use this model for identifying potential patients
with cardiovascular disease.
Individuals can use this model to check their condition
This model helps the user to take precautions about his cardiovascular status.
The early forecasting of cardiovascular disease prediction using this model
results in the reduction of risks.
Cardiovascular disease detection using machine learning is known as
intelligent computational predictive systems. These are proven to be very
effective in many medical industries.

Model solution:
The problem is about detecting if a person has or has not a cardiovascular
disease, which is a binary classification.
To solve the problem, a Machine Learning classification algorithm is
considered the most suitable. More about the model will be discussed further.

Dataset/Data Source:
Dataset is from Kaggle healthcare and medical datasets
https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
Dataset has individual variables which describe the person’s details.

Those variables are used for identifying a persons’ condition.

Cardio = pd.read_csv('cardio_train.csv',sep=';')​

    Features/Variables:
● Age | Objective Feature |age| int (days)
● Height | Objective Feature |height| int (cm) |
● Weight | Objective Feature |weight| float (kg) |
● Gender | Objective Feature |gender | categorical code |
● Systolic blood pressure | Examination Feature |ap_hi| int |
● Diastolic blood pressure | Examination Feature |ap_lo| int |
● Cholesterol | Examination Feature |cholesterol| 1: normal, 2: above normal,
3: well above normal |
● Glucose | Examination Feature |gluc| 1: normal, 2: above normal, 3: well
above normal |
● Smoking | Subjective Feature |smoke| binary |
● Alcohol intake | Subjective Feature |alco| binary |
● Physical activity | Subjective Feature |active| binary |
● Presence or absence of cardiovascular disease | Target Variable |cardio|
binary |
Words highlighted in blue are column names/column id in the dataset.
The variablecardiofrom the above variables is used as the target variable,
determining the person’s condition.
0 represent the person who doesn’t have cardiovascular disease
1 represent the person who has cardiovascular disease
0 35021
1 34979
Name: cardio, dtype: int64
Dataset shape
70000x12- 12 rows, 70000 columns

Preprocessing:
Dropped few duplicates - 24 duplicate
The age which is in days format is converted to years format.
Columns renamed
ap_hi is renamed as Systolic_bp, and ap_lo is renamed as Diastolic_bp, which are
more informative when using rather than old column names

Cardio.rename(columns = {'ap_hi':'Systolic_bp'}, inplace = True)
Cardio.rename(columns = {'ap_lo':'Diastolic_bp'}, inplace = True)​

Split the dataframe into Independent variable X and target variable y


X=Cardio.drop(['cardio'],axis=1)
y=Cardio['cardio'].copy()

Model Used:
The problem is a classification type, classifying if a person has or has not the
cardiovascular disease.
Decision Tree Classifier: Decision Trees are Supervised Machine Learning where
the data is continuously split according to a specific parameter.
The decision tree uses tree representation to solve the problem that predicts the
value of the target variable. Here leaf node is linked to the class label, and attributes
are on internal nodes.


from sklearn.tree import DecisionTreeClassifier
D_tree=DecisionTreeClassifier(max_depth=7)
D_tree.fit(X,y)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=7, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort='deprecated', random_state=None, splitter='best')

I choose a Decision Tree with max_depth
Max_depth - maximum depth of the tree(longest path from the root node to a leaf).
Criterion - To measure the quality of split, Gini is most trusted for entropy and
information gain.
Min_sample_split - minimum number of samples required for splitting a node. It
takes an int value.
Presort - is used to speed up the process to find the best split of data.

Cross_val_score function from sklearn library uses Stratified K-fold cross-validation
the technique when used with classification dataset


scores1=cross_val_score(D_tree,X,y,cv=1000)

In the above code, D_tree is the model name. Decision tree classifier.
X- dataframe with individual variables used for classification
y- dataframe with the target variable
cv- cross-validation strategy(number of iteration or number of folds) in this case
number of foldes, 1000 fold Stratifies Kfold cross-validation technique

Results:
Below are the results yielded by the model.



















Model Accuracy Precision Recall F1-Score
Decision Tree Classifier 88.57 74.18 73.64 73.52

0 comments