NEHA SINGH's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Disease Prediction Using Random Forest Classifier

Disease Prediction Using Random Forest Classifier

Models Status

Model Overview

Every day we predict according to symptoms what will be the possible reason for a symptom, just like that in this use case we will predicting the prognosis according to the symptoms defined by the patient. This can act as a second eye for the doctor to check their suggestion. Even at times, we have seen various apps and websites claims to help as a doctor on the basis of their models, we are also trying to build something like that here.

About the DataSet

Complete Dataset consists of 2 CSV files. One of them is training and other is for testing your model.

Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and last column is the prognosis.

These symptoms are mapped to 42 diseases you can classify these set of symptoms to.

You are required to train your model on training data and test it on testing data


The shape of Dataset :(4428, 132)

We have both Training and Testing Dataset All the values are present in 0,1 format.

There is equal distribution of values 120 in each class of prognosis.

Model Used: Random Forest Classifier
Accuracy: 93%
F1 Score: 0.91
Recall: 0.90
Precision: 0.92

The data set has 132 symptoms that act as features to prognosis:

'itching', 'skin_rash', 'nodal_skin_eruptions', 'continuous_sneezing', 'shivering', 'chills', 'joint_pain', 'stomach_pain', 'acidity', 'ulcers_on_tongue', 'muscle_wasting', 'vomiting', 'burning_micturition', 'spotting_ urination' ,'fatigue',
'weight_gain', 'anxiety' ,'cold_hands_and_feets' ,'mood_swings', 'weight_loss' ,'restlessness', 'lethargy',
'patches_in_throat', 'irregular_sugar_level', 'cough', 'high_fever', 'sunken_eyes','breathlessness', 'sweating',
'dehydration' ,'indigestion', 'headache', 'yellowish_skin', 'dark_urine' ,'nausea' ,'loss_of_appetite',
'pain_behind_the_eyes', 'back_pain','constipation', 'abdominal_pain', 'diarrhoea', 'mild_fever', 'yellow_urine',
'yellowing_of_eyes', 'acute_liver_failure' ,'fluid_overload', 'swelling_of_stomach', 'swelled_lymph_nodes',
'malaise', 'blurred_and_distorted_vision', 'phlegm' ,'throat_irritation', 'redness_of_eyes', 'sinus_pressure',
'runny_nose', 'congestion', 'chest_pain', 'weakness_in_limbs', 'fast_heart_rate', 'pain_during_bowel_movements',
'pain_in_anal_region', 'bloody_stool', 'irritation_in_anus', 'neck_pain', 'dizziness', 'cramps', 'bruising',
'obesity', 'swollen_legs', 'swollen_blood_vessels', 'puffy_face_and_eyes', 'enlarged_thyroid', 'brittle_nails',
'swollen_extremeties', 'excessive_hunger', 'extra_marital_contacts' ,'drying_and_tingling_lips', 'slurred_speech',
'knee_pain', 'hip_joint_pain', 'muscle_weakness' ,'stiff_neck', 'swelling_joints', 'movement_stiffness', 'spinning_movements',
'loss_of_balance', 'unsteadiness', 'weakness_of_one_body_side', 'loss_of_smell', 'bladder_discomfort',
'foul_smell_of urine', 'continuous_feel_of_urine', 'passage_of_gases', 'internal_itching', 'toxic_look_(typhos)',
'depression', 'irritability', 'muscle_pain', 'altered_sensorium', 'red_spots_over_body', 'belly_pain',
'abnormal_menstruation', 'dischromic _patches', 'watering_from_eyes', 'increased_appetite', 'polyuria', 'family_history',
'mucoid_sputum', 'rusty_sputum', 'lack_of_concentration', 'visual_disturbances', 'receiving_blood_transfusion',
'receiving_unsterile_injections', 'coma', 'stomach_bleeding', 'distention_of_abdomen', 'history_of_alcohol_consumption',
'fluid_overload.1', 'blood_in_sputum', 'prominent_veins_on_calf', 'palpitations', 'painful_walking', 'pus_filled_pimples', 'blackheads', 'scurring', 'skin_peeling',
'silver_like_dusting', 'small_dents_in_nails', 'inflammatory_nails', 'blister', 'red_sore_around_nose', 'yellow_crust_ooze'​

In prognosis we have 41 diseases as result:

'(vertigo) Paroymsal  Positional Vertigo', 'AIDS', 'Acne', 'Alcoholic hepatitis', 'Allergy', 'Arthritis', 'Bronchial Asthma', 'Cervical spondylosis', 'Chicken pox', 'Chronic cholestasis', 'Common Cold', 'Dengue', 'Diabetes ', 'Dimorphic hemmorhoids(piles)', 'Drug Reaction', 'Fungal infection', 'GERD', 'Gastroenteritis', 'Heart attack', 'Hepatitis B', 'Hepatitis C', 'Hepatitis D', 'Hepatitis E', 'Hypertension ', 'Hyperthyroidism', 'Hypoglycemia', 'Hypothyroidism', 'Impetigo', 'Jaundice', 'Malaria', 'Migraine', 'Osteoarthristis', 'Paralysis (brain hemorrhage)', 'Peptic ulcer diseae', 'Pneumonia', 'Psoriasis', 'Tuberculosis', 'Typhoid', 'Urinary tract infection', 'Varicose veins','hepatitis A'

Importing Libraries

# importing the library
import pandas as pd​

Reading Training Dataset

# Reading Training Dataset
df = pd.read_csv("training_data.csv")

# Checking shape of Dataset

(4920, 134)

# Storing prognosis(prediction column) in y_train dataframe
y_train =df["prognosis"]


# deleting prediction column as we have stored in y_train
del df["prognosis"]

# Unnamed column as it is of no use to us.
del df["Unnamed: 133"]​


# Checking the NULL Values

As we can see our dataset has no NULL values, and even we can not do any Data Cleaning, as the dataset is not that good. So, we can go on with this or else we can use a better dataset.

# Storing training dataset in X
X = df

# Stroing prediction column in Y
Y = y_train​

Now we will divide the dataset into train and test for doing the dataset.

Splitting the Data Set

# importing sklearn library for train test slpitting
from sklearn.model_selection import train_test_split
X_train, X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.1,stratify=Y,random_state=2)​

# Checking the shape of test train dataset
print(X.shape, X_train.shape, X_test.shape)

(4920, 132) (4428, 132) (492, 132)

Making Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 200, criterion = 'entropy', random_state = 0),Y_train)

prediction_rfc = classifier.predict(X_test)

import sklearn.metrics as metrics

print('Confusion Matrix: Random Forest Classifier')
print(metrics.confusion_matrix(Y_test, prediction_rfc))
print('\nClassification Report:')
print(metrics.classification_report(Y_test, prediction_rfc))

Accuracy: 1.0

Accuracy came out to be 100%(1.0)

# Reding the Testing Dataset
dft = pd.read_csv("test_data.csv")​

# Viweing Dataset

# Storing prediction column of testing dataset in y_test
y_test =dft["prognosis"]

# Cheking Dataset

# Deleting the prognosis column from testing dataset
del dft["prognosis"]


# Doing prediction for testing dataset
prediction = clf.predict(dft)

# Printing values of prediction

# Checking the accuracy of prediction
print("Accuracy: ", metrics.accuracy_score(prediction,y_test))

As you can see the prediction is coming out 100%.

You can run the model for yourself from the link of Deployment provided above.