Prasad Chaskar's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Patient Treatment Classification

Patient Treatment Classification

Models Status

Model Overview

In hospitals, medical treatments and surgeries can be categorized into inpatient and outpatient procedures. For patients, it is important to understand the difference between these two types of care, because they impact the length of a patient’s stay in a medical facility and the cost of a procedure.

Inpatient Care (Incare Patient) and Outpatient Care (Outcare Patient)

The difference between an inpatient and outpatient care is how long a patient must remain in the facility where they have the procedure done.
Inpatient care requires overnight hospitalization. Patients must stay at the medical facility where their procedure was done (which is usually a hospital) for at least one night. During this time, they remain under the supervision of a nurse or doctor.
Patients receiving outpatient care do not need to spend a night in a hospital. They are free to leave the hospital once the procedure is over. In some exceptional cases, they need to wait while anesthesia wears off or to make sure there are not any complications. As long as there are not any serious complications, patients do not have to spend the night being supervised. [source of information: pbmhealth]


Problem Statement
In today’s world of automation, the skills and knowledge of a person could be utilized at the best places possible by automating tasks wherever possible. As a part of the hospital automation system, one can build a system that would predict and estimate whether the patient should be categorized as an incare patient or an outcare patient with the help of several data points about the patients, their conditions and lab tests.

Build a machine learning model to predict if the patient should be classified as in care or out care based on the patient's laboratory test result.


About the data

The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patient's laboratory test results used to determine next patient treatment whether in care or out care.

Attribute Information

HAEMATOCRIT /Continuous /35.1 / Patient laboratory test result of haematocrit

HAEMOGLOBINS/Continuous/11.8 / Patient laboratory test result of haemoglobins

ERYTHROCYTE/Continuous/4.65 / Patient laboratory test result of erythrocyte

LEUCOCYTE /Continuous /6.3 / Patient laboratory test result of leucocyte

THROMBOCYTE/Continuous/310/ Patient laboratory test result of thrombocyte

MCH/Continuous /25.4/ Patient laboratory test result of MCH

MCHC/Continuous/33.6/ Patient laboratory test result of MCHC

MCV/Continuous /75.5/ Patient laboratory test result of MCV

AGE/Continuous/12/ Patient age

SEX/Nominal – Binary/F/ Patient gender

SOURCE/Nominal/ {1,0}/The class target 1.= in care patient, 0 = out care patient



Required Libraries:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

Load Dataset:

df = pd.read_csv('training_set.csv')

Check for null values:


There are no null values present in our dataset.

Data Visualization:


plt.title("Distribution of feature hematocrit",{'fontsize':20})

HAEMATOCRIT is the of the volume of red blood cells to the total volume of blood. The measurement depends on the number and size of red blood cells. It is normally 40.7–50.3% for males and 36.1–44.3% for females. From above distribution we can say that in our data most of patients have normal hematocrit %.


plt.title("Distribution of feature HAEMOGLOBINS",{'fontsize':20})

Hemoglobin is a protein in red blood cells that carries oxygen. The hemoglobin test measures how much hemoglobin is in your blood.

Distribution of in care patient and out care patient by gender:
1 = in care patient
0 = out care patient

female_df = df[df.SEX=='F']
male_df = df[df.SEX=='M']

plt.title("Female patients",{'fontsize':20})

plt.title("Male patients",{'fontsize':20})

Split Data:

Handle Categorical Variables:

def fun(df):
if df=='M':
return 0
return 1
X.SEX = X.SEX.apply(fun)

Divide Data into train data and test data (validation data):

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.1,random_state=11)

Data Scaling:

scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

Model Building:

models = {
LogisticRegression(max_iter=500):'Logistic Regression',
SVC():"Support Vector Machine",
RandomForestClassifier():'Random Forest'
for m in models.keys():,y_train)
for model,name in models.items():
print(f"Accuracy Score for {name} is : ",model.score(X_test,y_test)*100,"%")

Random forest gives higher accuracy as compared to other classification algorithms. So, we choose it for predictions.

random_forest = RandomForestClassifier(n_estimators=200,
min_samples_split= 2,
min_samples_leaf= 2,
max_features= 'auto',
max_depth= 70,
bootstrap= True),y_train)

Classification Report:

1 = in care patient
0 = out care patient

pred = random_forest.predict(X_test)