About Dataset
Description:
Smartphones are getting intelligent day by day to assist Human's to aid in their day to day activity. A new feature has emerged popular in the fitness community that keeps an account of one's daily footsteps.
More advanced versions include differentiating between detecting the difference between walking & run. This is achieved with the help of Sensors. Several such Sensor data is recorded with IOS device & labelled as walking or running as 0 or 1.
Currently, the dataset contains a single file which represents 88588 sensor data samples collected from accelerometer and gyroscope from iPhone 5c in 10 seconds interval and ~5.4/second frequency. This data is represented by following columns (each column contains sensor data for one of the sensor's axes):
There is an activity type represented by "activity" column which acts as label and reflects following activities:
Apart of that, the dataset contains "wrist" column which represents the wrist where the device was placed to collect a sample on:
Additionally, the dataset contains "date", "time" and "username" columns which provide information about the exact date, time and user which collected these measurements.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from sklearn.metrics import classification_report
Load Data:
motion_df = pd.read_csv('Kinematics_Data.csv')
motion_df.head()
Drop Unnecessary Columns:
motion_df.drop(['date','time','username'],axis=1,inplace=True)
Shape of Dataset:
motion_df.shape
The dataset contains 88588 sensor data along with 8 columns.
Search For NULL Values:
motion_df.isnull().sum()
There are no null values present in dataset.
Data Types of Columns:
motion_df.dtypes
We can see most columns are numerical.
Data Visualization:
Target Variable (Activity):
sns.countplot(x='activity',data=motion_df);
0: Walking and 1: Running
left_wrist = motion_df[motion_df.wrist == 0]
right_wrist = motion_df[motion_df.wrist == 1]
Gyro Motion:
fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111, projection = '3d')
ax.scatter(left_wrist.gyro_x, left_wrist.gyro_y, left_wrist.gyro_z)
plt.title("Left Wrist Gyro Motion",{'fontsize':25});
plt.show()
fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111, projection = '3d')
ax.scatter(right_wrist.gyro_x, right_wrist.gyro_y, right_wrist.gyro_z)
plt.title("Right Wrist Gyro Motion",{'fontsize':25});
plt.show()
Acceleration:
plt.figure(figsize=(8,6))
plt.scatter(x=left_wrist.acceleration_x,y=left_wrist.acceleration_y);
plt.scatter(x=right_wrist.acceleration_x,y=right_wrist.acceleration_y);
plt.title("Left wrist and Right wrist acceleration(X&Y)",{'fontsize':25});
Split Data:
Now, we split the data into features (X) and target (y) and then split it into train and test sets.30% of the dataset will be reserved for the test set and remaining data for training purpose.
X = motion_df.drop('activity',axis=1)
y = motion_df.activity
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
Model Training:
Logistic Regression:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(max_iter=3000)
log_reg.fit(X_train,y_train)
log_reg.score(X_test,y_test)*100
Accuracy: 86.05184934341725
Classification Report for Logistic Regression:
log_reg_pred = log_reg.predict(X_test)
print(classification_report(y_test,log_reg_pred))
Rando Forest:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train,y_train)
rf.score(X_test,y_test)*100
Accuracy: 99.11577679948827
Classification Report for Random Forest:
rf_pred = rf.predict(X_test)
print(classification_report(y_test,rf_pred))
And finally, random forest the classifier achieved 99.11% accuracy on the test set. So, we choose it for prediction.
Thank You ):