Note: If the author has requested for "Expert Guidance" and you can help, please start a New Topic in the "Discussions" Tab

Tarun Reddy's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Diabetes Prediction using Support Vector Machine

Diabetes Prediction using Support Vector Machine

Models Status

Model Overview

Diabetes mellitus, generally known as diabetes, is a metabolic sickness that causes high glucose. The chemical insulin moves sugar from the blood into your cells to be put away or utilized for energy. With diabetes, your body either doesn't make sufficient insulin or can't successfully utilize the insulin it makes. Untreated high glucose from diabetes can harm your nerves, eyes, kidneys, and different organs.If you have diabetes, your body isn’t able to properly process and use glucose from the food you eat. There are different types of diabetes, each with different causes, but they all share the common problem of having too much glucose in your bloodstream. Treatments include medications and/or insulins. Some types of diabetes can be prevented by adopting a healthy lifestyle.


Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age (years)
Outcome: Class variable (0 or 1)


Lets import the required libraries

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pickle

Now load the dataset and let us check the first five rows of the dataset.

diabetes_dataset = pd.read_csv('E:\\diabetes\\diabetes.csv')


Let us now check the dimensions of the dataset.


In the next step let us check the statistical measures of the data


Now let us see how many cases are there for diabetic examples and non diabetic examples


we can see there are 500 non diabetic examples and 268 diabetic examples.
In the next step let us find the correlation between data points.

import seaborn as sns
sns.heatmap(corr_mat, annot=True)

Now let us check for any null values present in the dataset.


There are no null values present in the dataset.
Now let us separate the datapoints and store them in different variables.

X = diabetes.drop(columns = 'Outcome', axis=1)
Y = diabetes['Outcome']


Now split dataset into training set and testing set

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)

Now let us build the model and check their accuracy score

from sklearn import svm
classifier = svm.SVC(kernel='linear'), Y_train)

 let us check the accuracy on training data

# accuracy score on the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score of the training data : ', training_data_accuracy)

Let us check the accuracy on testing data.

# accuracy score on the test data
X_test_prediction = classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score of the test data : ', test_data_accuracy)

from sklearn.metrics import classification_report
print(classification_report(X_test_prediction, Y_test))

Now let us check the model by passing some random data points.

input_data = (5,166,72,19,175,25.8,0.587,51)
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
std_data = (input_data_reshaped)

prediction = classifier.predict(std_data)

if (prediction[0] == 0):
print('The person is not diabetic')
print('The person is diabetic')

save the model