Prasad Chaskar's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Body Performance Prediction Using Random Forest

Body Performance Prediction Using Random Forest

Models Status

Model Overview

Physical fitness provides strong bones and muscles, leads to better health and well-being, prevents various health problems, reduces the risk of several diseases like blood pressure, diabetes, cancer, etc. and improves a better quality of life. Physical fitness reduces stress, tension and chances of being depressed and makes you feel better. You can improve your physical fitness and body composition by making healthier food choices and regularly engaging in both aerobic and anaerobic exercises.
Fitness also has an impact on the muscular system and works to develop and strengthen it, as it reduces the prevalent diseases, especially heart disease and excessive obesity.

About Dataset : This is data that confirmed the grade of performance with age and some exercise performance data.


F
eatures :



  • age : 20 ~64

  • gender : F,M

  • height_cm : (If you want to convert to feet, divide by 30.48)

  • weight_kg

  • body fat_%

  • diastolic : diastolic blood pressure (min)

  • systolic : systolic blood pressure (min)

  • gripForce

  • sit and bend forward_cm

  • sit-ups counts

  • broad jump_cm

  • class : A,B,C,D ( A: best) / stratified



Dataset link :https://www.kaggle.com/kukuroo3/body-performance-data


Import Libraries :

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report,confusion_matrix
import numpy as np
from sklearn.ensemble import StackingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler​

Useful Parameters :


clr = 'orangered'
title_size = 25

Import Data :


perform_df = pd.read_csv('bodyPerformance.csv')
perform_df.head()





Check NULL values are present in our Dataset or not

perform_df.isnull().sum()​


Data Cleaning :


chenge_dtypes = ['age','diastolic','systolic','sit-ups counts','broad jump_cm']
for i in chenge_dtypes:
perform_df[i] = perform_df[i].astype('int')




Create 4 dataframes for following


A- Best body performance

B- Good body performance

C- Average body performance

D- Poor body performance


A_performance = perform_df[perform_df['class'] == 'A']
B_performance = perform_df[perform_df['class'] == 'B']
C_performance = perform_df[perform_df['class'] == 'C']
D_performance = perform_df[perform_df['class'] == 'D']​





Data Visualization :
1] Age

fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);

sns.histplot(ax=axes[0][0],x='age',data=A_performance,color=clr);
axes[0][0].set_title('Age distribution for best health');

sns.histplot(ax=axes[0][1],x='age',data=B_performance,color=clr);
axes[0][1].set_title('Age distribution for good health');

sns.histplot(ax=axes[1][0],x='age',data=C_performance,color=clr);
axes[1][0].set_title('Age distribution for average health');

sns.histplot(ax=axes[1][1],x='age',data=D_performance,color=clr);
axes[1][1].set_title('Age distribution for poor health');​

2]Gender


fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);
axes[0][0].pie(A_performance.gender.value_counts(),labels =['Male','Female'],autopct='%.0f%%',shadow=True,colors=['deepskyblue','springgreen'])
axes[0][0].set_title('Gender distribution for best health');
axes[0][1].pie(B_performance.gender.value_counts(),labels =['Male','Female'],autopct='%.0f%%',shadow=True,colors=['deepskyblue','springgreen'])
axes[0][1].set_title('Gender distribution for best health');
axes[1][0].pie(C_performance.gender.value_counts(),labels =['Male','Female'],autopct='%.0f%%',shadow=True,colors=['deepskyblue','springgreen'])
axes[1][0].set_title('Gender distribution for best health');
axes[1][1].pie(D_performance.gender.value_counts(),labels =['Male','Female'],autopct='%.0f%%',shadow=True,colors=['deepskyblue','springgreen'])
axes[1][1].set_title('Gender distribution for best health');


3]Height


fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);

sns.histplot(ax=axes[0][0],x='height_cm',data=A_performance,color=clr);
axes[0][0].set_title('Height distribution for best health');

sns.histplot(ax=axes[0][1],x='height_cm',data=B_performance,color=clr);
axes[0][1].set_title('Height distribution for good health');

sns.histplot(ax=axes[1][0],x='height_cm',data=C_performance,color=clr);
axes[1][0].set_title('Height distribution for average health');

sns.histplot(ax=axes[1][1],x='height_cm',data=D_performance,color=clr);
axes[1][1].set_title('Height distribution for poor health');


4]Weight


fig, axes = plt.subplots(2, 2, figsize=(18,10), sharey=True);

sns.histplot(ax=axes[0][0],x='weight_kg',data=A_performance,color=clr);
axes[0][0].set_title('Weight distribution for best health');

sns.histplot(ax=axes[0][1],x='weight_kg',data=B_performance,color=clr);
axes[0][1].set_title('Weight distribution for good health');

sns.histplot(ax=axes[1][0],x='weight_kg',data=C_performance,color=clr);
axes[1][0].set_title('Weight distribution for average health');

sns.histplot(ax=axes[1][1],x='weight_kg',data=D_performance,color=clr);
axes[1][1].set_title('Weight distribution for poor health');




5]Class(Target Variable)

sns.countplot(x='class',data=perform_df);​


Data Preprocessing

def fun(df):
if df == 'M':
return 0
else:
return 1
perform_df['gender'] = perform_df.gender.apply(fun)
perform_df.head()​

def target_fun(df):
if df=='A':
return 0
elif df=='B':
return 1
elif df=='C':
return 2
else:
return 3
perform_df['class'] = perform_df['class'].apply(target_fun)
perform_df.head()


Split Data into independent and dependent variable

X = perform_df.drop('class',axis=1)
y = perform_df['class']​


Split data into train data and test data

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=11)​

X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

Model Training


models = {
LogisticRegression(max_iter=500):'Logistic Regression',
SVC():"Support Vector Machine",
RandomForestClassifier():'Random Forest'
}
for m in models.keys():
m.fit(X_train,y_train)
for model,name in models.items():
print(f"Accuracy Score for {name} is : ",model.score(X_test,y_test)*100,"%")


Try some other models and stacking technique


from sklearn.ensemble import AdaBoostClassifier
adaboost = AdaBoostClassifier()
adaboost.fit(X_train,y_train)
adaboost.score(X_test,y_test)

Output : 0.6047


level0 = list()
level0.append(('lr', LogisticRegression(max_iter=1000)))
level0.append(('knn', KNeighborsClassifier()))
level0.append(('cart', DecisionTreeClassifier()))
level0.append(('svm', SVC()))
level0.append(('bayes', GaussianNB()))
level1 = RandomForestClassifier()
stack_model = StackingClassifier(estimators=level0, final_estimator=level1, cv=5)
stack_model.fit(X_train,y_train)
stack_model.score(X_test,y_test)

Output : 0.71257



Based on various algorithms random forest gives better accuracy.So we use random forest for prediction.

rf = RandomForestClassifier()
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)



Classification Report 

print(classification_report(y_test,y_pred))​


Confusion Matrix


y_pred = model.predict(X_test)
class_names = [0,1]
fig,ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks,class_names)
plt.yticks(tick_marks,class_names)
cnf_matrix = confusion_matrix(y_test,y_pred)
sns.heatmap(pd.DataFrame(cnf_matrix), annot = True,fmt = 'd')
ax.xaxis.set_label_position('top')
plt.tight_layout()
plt.title(f'Confusion Matrix for Random Forest', {'fontsize':title_size})
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.show()


Feature Importance


feature = pd.Series(rf.feature_importances_, index = X.columns).sort_values(ascending = False)
plt.figure(figsize = (10,6))
sns.barplot(x = feature, y = feature.index)
plt.title("Feature Importance")
plt.xlabel('Score')
plt.ylabel('Features')
plt.show()


Thank You ):


















0 comments