In hospitals, medical treatments and surgeries can be categorized into inpatient and outpatient procedures. For patients, it is important to understand the difference between these two types of care, because they impact the length of a patient’s stay in a medical facility and the cost of a procedure.
Inpatient Care (Incare Patient) and Outpatient Care (Outcare Patient)
The difference between an inpatient and outpatient care is how long a patient must remain in the facility where they have the procedure done.
Inpatient care requires overnight hospitalization. Patients must stay at the medical facility where their procedure was done (which is usually a hospital) for at least one night. During this time, they remain under the supervision of a nurse or doctor.
Patients receiving outpatient care do not need to spend a night in a hospital. They are free to leave the hospital once the procedure is over. In some exceptional cases, they need to wait while anesthesia wears off or to make sure there are not any complications. As long as there are not any serious complications, patients do not have to spend the night being supervised. [source of information: pbmhealth]
Problem Statement
In today’s world of automation, the skills and knowledge of a person could be utilized at the best places possible by automating tasks wherever possible. As a part of the hospital automation system, one can build a system that would predict and estimate whether the patient should be categorized as an incare patient or an outcare patient with the help of several data points about the patients, their conditions and lab tests.
Objective
Build a machine learning model to predict if the patient should be classified as in care or out care based on the patient's laboratory test result.
About the data
The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patient's laboratory test results used to determine next patient treatment whether in care or out care.
Attribute Information
HAEMATOCRIT /Continuous /35.1 / Patient laboratory test result of haematocrit
HAEMOGLOBINS/Continuous/11.8 / Patient laboratory test result of haemoglobins
ERYTHROCYTE/Continuous/4.65 / Patient laboratory test result of erythrocyte
LEUCOCYTE /Continuous /6.3 / Patient laboratory test result of leucocyte
THROMBOCYTE/Continuous/310/ Patient laboratory test result of thrombocyte
MCH/Continuous /25.4/ Patient laboratory test result of MCH
MCHC/Continuous/33.6/ Patient laboratory test result of MCHC
MCV/Continuous /75.5/ Patient laboratory test result of MCV
AGE/Continuous/12/ Patient age
SEX/Nominal – Binary/F/ Patient gender
SOURCE/Nominal/ {1,0}/The class target 1.= in care patient, 0 = out care patient
Link: https://www.kaggle.com/manishkc06/patient-treatment-classification
Code:
Required Libraries:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
Load Dataset:
df = pd.read_csv('training_set.csv')
df.head()
Check for null values:
df.isnull().sum()
There are no null values present in our dataset.
Data Visualization:
Feature HAEMATOCRIT:
plt.figure(figsize=(8,7))
sns.histplot(x='HAEMATOCRIT',data=df,color='b');
plt.title("Distribution of feature hematocrit",{'fontsize':20})
HAEMATOCRIT is the of the volume of red blood cells to the total volume of blood. The measurement depends on the number and size of red blood cells. It is normally 40.7–50.3% for males and 36.1–44.3% for females. From above distribution we can say that in our data most of patients have normal hematocrit %.
plt.figure(figsize=(8,7))
sns.histplot(x='HAEMOGLOBINS',data=df,color='b');
plt.title("Distribution of feature HAEMOGLOBINS",{'fontsize':20})
Hemoglobin is a protein in red blood cells that carries oxygen. The hemoglobin test measures how much hemoglobin is in your blood.
female_df = df[df.SEX=='F']
male_df = df[df.SEX=='M']
sns.countplot(x='SOURCE',data=female_df);
plt.title("Female patients",{'fontsize':20})
sns.countplot(x='SOURCE',data=male_df);
plt.title("Male patients",{'fontsize':20})
Split Data:
Handle Categorical Variables:
def fun(df):
if df=='M':
return 0
else:
return 1
X.SEX = X.SEX.apply(fun)
Divide Data into train data and test data (validation data):
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.1,random_state=11)
Data Scaling:
scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)
Model Building:
models = {
LogisticRegression(max_iter=500):'Logistic Regression',
SVC():"Support Vector Machine",
RandomForestClassifier():'Random Forest'
}
for m in models.keys():
m.fit(X_train,y_train)
for model,name in models.items():
print(f"Accuracy Score for {name} is : ",model.score(X_test,y_test)*100,"%")
Random forest gives higher accuracy as compared to other classification algorithms. So, we choose it for predictions.
random_forest = RandomForestClassifier(n_estimators=200,
min_samples_split= 2,
min_samples_leaf= 2,
max_features= 'auto',
max_depth= 70,
bootstrap= True)
random_forest.fit(X_train,y_train)
random_forest.score(X_test,y_test)
Classification Report:
1 = in care patient
0 = out care patient
pred = random_forest.predict(X_test)
print(classification_report(y_test,pred))