Prasad Chaskar's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Detecting Of Diabetes Using Health Indicators

Detecting Of Diabetes Using Health Indicators

Models Status

Model Overview

Diabetes is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood, and can lead to reduced quality of life and life expectancy. After different foods are broken down into sugars during digestion, the sugars are then released into the bloodstream. This signals the pancreas to release insulin. Insulin helps enable cells within the body to use those sugars in the bloodstream for energy. Diabetes is generally characterized by either the body not making enough insulin or being unable to use the insulin that is made as effectively as needed.Complications like heart disease, vision loss, lower-limb amputation, and kidney disease are associated with chronically high levels of sugar remaining in the bloodstream for those with diabetes. While there is no cure for diabetes, strategies like losing weight, eating healthily, being active, and receiving medical treatments can mitigate the harms of this disease in many patients. Early diagnosis can lead to lifestyle changes and more effective treatment, making predictive models for diabetes risk important tools for public and public health officials.

About Dataset : 

diabetes _ binary _ 5050split _ health _ indicators _ BRFSS2015.csv is a clean dataset of 70,692 survey responses to the CDC's BRFSS2015. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced.

Dataset Link :

Feature Sets : 
Diabetes_binary : 0 = no diabetes 1 = diabetes 
HighBP : 0 = no high BP 1 = high BP
HighChol : 0 = no high cholesterol 1 = high cholesterol CholCheck
BMI : Body Mass Index
Smoker : Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] 0 = no 1 = yes
Stroke : (Ever told) you had a stroke. 0 = no 1 = yes
HeartDiseaseorAttack : coronary heart disease (CHD) or myocardial infarction (MI) 0 = no 1 = yes
PhysActivity : physical activity in past 30 days - not including job 0 = no 1 = yes
Fruits : Consume Fruit 1 or more times per day 0 = no 1 = yes
Veggies : Consume Vegetables 1 or more times per day 0 = no 1 = yes
HvyAlcoholConsump : Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week) 0 = no 1 = yes
AnyHealthcare : Have any kind of health care coverage, including health insurance, prepaid plans such as HMO, etc. 0 = no 1 = yes
NoDocbcCost : Was there a time in the past 12 months when you needed to see a doctor but could not because of cost? 0 = no 1 = yes
GenHlth : Would you say that in general your health is: scale 1-5 1 = excellent 2 = very good 3 = good 4 = fair 5 = poor
MentHlth : Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? scale 1-30 days
PhysHlth : Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good? scale 1-30 days
DiffWalk : Do you have serious difficulty walking or climbing stairs? 0 = no 1 = yes
Sex : 0 = female 1 = male
Age : 13-level age category (_AGEG5YR see codebook) 1 = 18-24 9 = 60-64 13 = 80 or older
Education : Education level (EDUCA see codebook) scale 1-6 1 = Never attended school or only kindergarten 2 = Grades 1 through 8 (Elementary) 3 = Grades 9 through 11 (Some high school) 4 = Grade 12 or GED (High school graduate) 5 = College 1 year to 3 years (Some college or technical school) 6 = College 4 years or more (College graduate)
Income : Income scale (INCOME2 see codebook) scale 1-8 1 = less than $10,000 5 = less than $35,000 8 = $75,000 or more

Lets look at the code ):

Import Libraries :

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import classification_report

Read Data :

diabetes_df = pd.read_csv('diabetes_binary_5050split_health_indicators_BRFSS2015.csv')

Shape of Dataset :


Dataset contains 70692 rows and 22 columns.
Check for NULL Values :


Data Visualization :
Plot for age feature : 


Create two dataframes 1) patients have diabetes and 2) not have diabetes

patients_with_db = diabetes_df[diabetes_df.Diabetes_binary==1]
patients_without_db = diabetes_df[diabetes_df.Diabetes_binary==0]

Plot for High Blood Pressire :
0 - no high BP
1 - high bp

fig, axes = plt.subplots(1, 2, figsize=(18,6), sharey=True);
axes[0].set_title('Count of Patients have high BP and diabetes');
axes[1].set_title('Count of Patients have high BP but not a diabetes');

People with diabetes tend to high blood pressure.
Plot for cholestrol and smoker features : 

0 - no high cholestrol / smoker
1 - high cholestrol / smoker

fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);

axes[0][0].set_title('Count of Patients have HighChol and diabetes');

axes[0][1].set_title('Count of Patients have HighChol but not a diabetes');

axes[1][0].set_title('Count of Patients are smokers but have diabetes');

axes[1][1].set_title('Count of Patients are smokers but not have diabetes');

Plot for Phyactivity feature :
0 - No
1 - Yes

fig, axes = plt.subplots(1, 2, figsize=(18,6), sharey=True);
axes[0].set_title('Physical activity in past 30 days and have diadebets');
axes[1].set_title('Physical activity in past 30 days and not have diadebets');

Plot for Fruits and Veggies
0 - No
1 - Yes

fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);

axes[0][0].set_title('Consume Fruit 1 or more times per day and have diadebets');
axes[0][1].set_title('Consume Fruit 1 or more times per day and not have diadebets');

axes[1][0].set_title('Consume Vegetables 1 or more times per day but have diabetes');
axes[1][1].set_title('Consume Vegetables 1 or more times per day but not have diabetes');

Split Data :

X = diabetes_df.drop('Diabetes_binary',axis=1)
y = diabetes_df.Diabetes_binary

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=11)

Data Normalization :

scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

Model Building :

model = keras.Sequential([
keras.layers.Dense(25, input_shape=(21,),activation='relu'),
keras.layers.Dense(55, activation='relu'),
keras.layers.Dense(65, activation='relu'),
keras.layers.Dense(75, activation='relu'),
keras.layers.Dense(85, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
histroy =, y_train, epochs=50)

Model Evaluation :

yp = model.predict(X_test)
y_pred = []
for element in yp:
if element > 0.5:

Classification Report :


Confusion Matrix :

cm = tf.math.confusion_matrix(labels=y_test,predictions=y_pred)
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True, fmt='d')

Accuracy Plot for Model :

plt.title('model accuracy')
plt.legend(['train', 'test'], loc='upper left')

Plot Loss for Model :

plt.title('model loss')
plt.legend(['train', 'test'], loc='upper left')

Thank You ):