Gadde Sai Shailesh's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Generic Models » Computer Vision » Yoga Pose Detection Using MediaPipe

Yoga Pose Detection Using MediaPipe

Models Status

Model Overview

Introduction


Yoga is a word that became popular across the world in the last few years. Yoga is not only beneficial for the body but also for the mind. It helps to improve blood flow and helps in building mind clarity. For ages, yoga was known to be beneficial for our physical & mental health. It not only helps us to stay calm but also helps us to lose weight.
                                                                                      


During this ongoing coronavirus pandemic, when we are all bound to live a restricted life under the constant fear of infection risks, it is natural for anyone to develop anxiety. The continuous flow of negative news, the inadequacy of daily resources, everything is adding to this growing anxiety and depression. Being confined at home for such long periods of time can be mentally challenging for us. When our mind is flooded with the uncertainty of the future, we often experience sleepless nights causing fatigue. Many of us are unable to relax our mind during this time thereby increasing the stress on our minds.


During this time, it is important to understand that mental health is very important for survival. To help with this growing level of anxiety and depression, we must lead a healthy lifestyle, stay connected to our loved ones, and practice yoga at home.


Why Yoga?


Anxiety or stress usually triggers the sympathetic nervous system which will have manifestations such as increased blood pressure, tensed muscles, lack of concentration, faster breathing, and yoga helps to calm that down.


“Yoga is a great tool as the stretching poses help to reduce tension in muscles and joints, and this can, in turn, help relax the sympathetic system. There are many yoga poses which are excellent for managing blood pressure thereby reducing anxiety symptoms.”


                                                                                           


What is the end result?


In this model, we can pass an image and see which yoga pose is being performed.


As of now, our model can identify only 3 poses - Mountain Pose, Tree Pose and Downward-Facing Dog Pose.






How did I do it?


You must have a doubt that why did I highlight the angles in the images above. Well, that is a crucial part of the detections part and it also helps us in creating a wide variety of use cases of where this model can be applied(which I'll explain at the end).


So let us understand how this model works-


First, we need to know about MediaPipe library and what it offers.


Mediapipe


MediaPipe, an open-source framework designed specifically for complex perception pipelines leveraging accelerated inference (e.g., GPU or CPU), already offers fast and accurate, yet separate, solutions for these tasks. Combining them all in real-time into a semantically consistent end-to-end solution is a uniquely difficult problem requiring simultaneous inference of multiple, dependent neural networks.




I have used Mediapipe Pose Detection for this model.


MediaPipe Pose is an ML solution for high-fidelity body pose tracking, inferring 33 3D landmarks on the whole body from RGB video frames utilizing our BlazePose research that also powers the ML Kit Pose Detection API. Current state-of-the-art approaches rely primarily on powerful desktop environments for inference, whereas our method achieves real-time performance on most modern mobile phones, desktops/laptops, in python and even on the web.


How does Mediapipe Pose Detection work?


The solution utilizes a two-step detector-tracker ML pipeline, proven to be effective in our MediaPipe Hands and MediaPipe Face Mesh solutions. Using a detector, the pipeline first locates the person/pose region-of-interest (ROI) within the frame. The tracker subsequently predicts the pose landmarks within the ROI using the ROI-cropped frame as input. Note that for video use cases the detector is invoked only as needed, i.e., for the very first frame and when the tracker could no longer identify body pose presence in the previous frame. For other frames, the pipeline simply derives the ROI from the previous frame’s pose landmarks.

For a better understanding of BlazePose.


How did I train the model?


From what I have discovered there are 2 ways to train the model. The first way is to give the model a video/a live feed of you performing a certain yoga pose and training the model. So that when you perform the action in real-time then it can detect the actions performed in real-time. The second method does not require training and it is a little bit interesting too.


When we pass an image through the MediaPipe pose we get landmarks(coordinates) of the person in the image in a list format.


                



From these coordinates, we can calculate the angle between different points.


For example, in the tree pose, I have calculated the angle between Hip, knee and foot and checked whether the angle is less than 90 degrees or not. Just like that in different yoga poses we can calculate the angles between different points and predict the pose in the image.


Let's Understand the code


First, Let's import all the required libraries


import cv2
import numpy as np
import mediapipe as mp

 
Now, Prepare DrawingSpec for drawing the face landmarks later.


mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

 


For finding the angle between 3 points, check this article.


def calculate_angle(a,b,c):
a = np.array(a) # First
b = np.array(b) # Mid
c = np.array(c) # End

radians = np.arctan2(c[1]-b[1], c[0]-b[0]) - np.arctan2(a[1]-b[1], a[0]-b[0])
angle = np.abs(radians*180.0/np.pi)

if angle >180.0:
angle = 360-angle

return angle

 


Now, let's see the predict function


def predict(frame,output_directory):
## Setup mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
image = cv2.imread(frame)
height, width = image.shape[:2] #getting the shape of the image.
# Recolor image to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image.flags.writeable = False

# Make detection
results = pose.process(image)

# Recolor back to BGR
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# Extract landmarks

landmarks = results.pose_landmarks.landmark
print(landmarks)

Here, we are first setting the min_detection_confidence and min_tracking_confidence as 0.5.


What are min_detection_confidence and min_tracking_confidence?


MIN_DETECTION_CONFIDENCE


Minimum confidence value ([0.0, 1.0]) from the person-detection model for the detection to be considered successful. Default to 0.5.


MIN_TRACKING_CONFIDENCE


Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the pose landmarks to be considered tracked successfully, or otherwise, person detection will be invoked automatically on the next input image. Setting it to a higher value can increase the robustness of the solution, at the expense of higher latency. Ignored if static_image_mode is, where person detection simply runs on every image. Default to 0.5.


Then, we read the image and convert the image from BGR to RGB before processing.


Then, we detect the landmarks of the image.


The output for landmarks would be a list of 33 coordinates. Each item represents the landmarks of a particular point. For more clarity, you can count the index number and compare the index number with the points in the above image.


[x: 0.500588595867157
y: 0.22035755217075348
z: -0.1684466451406479
visibility: 0.9999655485153198
, x: 0.5060998201370239
y: 0.20668834447860718
z: -0.1433268040418625
visibility: 0.9999816417694092
, x: 0.5099505186080933
y: 0.2070736289024353
z: -0.14337778091430664
visibility: 0.9999759197235107
, x: 0.5127272009849548
y: 0.2076309621334076
z: -0.1433948576450348
visibility: 0.999975323677063
, x: 0.49392595887184143
y: 0.20665624737739563
z: -0.15220026671886444
visibility: 0.9999812841415405
, x: 0.4896988868713379
y: 0.20688557624816895
z: -0.15228931605815887
visibility: 0.9999803304672241
, x: 0.48513248562812805
y: 0.20731249451637268
z: -0.15239396691322327
visibility: 0.9999831914901733
, x: 0.5167638659477234
y: 0.2151087373495102
z: -0.012955455109477043
visibility: 0.9999970197677612
, x: 0.4803517460823059
y: 0.21568699181079865
z: -0.04967144504189491
visibility: 0.9999960660934448
, x: 0.507878303527832
y: 0.23785164952278137
z: -0.11846306174993515
visibility: 0.9999810457229614
, x: 0.4915904402732849
y: 0.23844127357006073
z: -0.12903156876564026
visibility: 0.9999856948852539
, x: 0.5402618050575256
y: 0.29372262954711914
z: 0.05289541929960251
visibility: 0.9992573857307434
, x: 0.45479172468185425
y: 0.2939188480377197
z: -0.05460425838828087
visibility: 0.9999469518661499
, x: 0.5455511808395386
y: 0.17350983619689941
z: 0.03273892402648926
visibility: 0.9953662157058716
, x: 0.4561057984828949
y: 0.1688411831855774
z: -0.13315944373607635
visibility: 0.9995226860046387
, x: 0.5034188032150269
y: 0.09192818403244019
z: -0.029339080676436424
visibility: 0.9693756103515625
, x: 0.49075162410736084
y: 0.08348339796066284
z: -0.11438602954149246
visibility: 0.9964680671691895
, x: 0.5000284910202026
y: 0.06389960646629333
z: -0.05663144960999489
visibility: 0.9019898176193237
, x: 0.49557971954345703
y: 0.05285307765007019
z: -0.13335900008678436
visibility: 0.9821699857711792
, x: 0.5015572905540466
y: 0.06532785296440125
z: -0.0475655160844326
visibility: 0.893269956111908
, x: 0.4985186457633972
y: 0.05606105923652649
z: -0.12733954191207886
visibility: 0.9807336330413818
, x: 0.501517653465271
y: 0.07975029945373535
z: -0.03111031837761402
visibility: 0.9023681282997131
, x: 0.4982141852378845
y: 0.06985631585121155
z: -0.11347682029008865
visibility: 0.9812812209129333
, x: 0.5308948159217834
y: 0.5385144352912903
z: 0.04895031079649925
visibility: 0.9999910593032837
, x: 0.47694456577301025
y: 0.5437374114990234
z: -0.04859958589076996
visibility: 0.999998927116394
, x: 0.639201819896698
y: 0.644744336605072
z: -0.0056316605769097805
visibility: 0.9994655251502991
, x: 0.488059401512146
y: 0.7466689944267273
z: -0.018457671627402306
visibility: 0.9998968839645386
, x: 0.5150023698806763
y: 0.6279218792915344
z: 0.32945728302001953
visibility: 0.8660487532615662
, x: 0.5091462731361389
y: 0.9148943424224854
z: 0.12208233028650284
visibility: 0.9981447458267212
, x: 0.5058321356773376
y: 0.6049761176109314
z: 0.36336928606033325
visibility: 0.8977268934249878
, x: 0.5127518177032471
y: 0.9364270567893982
z: 0.13377633690834045
visibility: 0.9838213920593262
, x: 0.4967653751373291
y: 0.6919939517974854
z: 0.3319450914859772
visibility: 0.9028685092926025
, x: 0.5204339623451233
y: 0.9614124894142151
z: -0.0047808256931602955
visibility: 0.9962467551231384
]

 


Now we calculate the angles between the points.


try:
hip_left= [landmarks[mp_pose.PoseLandmark.LEFT_HIP.value].x,landmarks[mp_pose.PoseLandmark.LEFT_HIP.value].y]
knee_left = [landmarks[mp_pose.PoseLandmark.LEFT_KNEE.value].x,landmarks[mp_pose.PoseLandmark.LEFT_KNEE.value].y]
foot_left = [landmarks[mp_pose.PoseLandmark.LEFT_FOOT_INDEX.value].x,landmarks[mp_pose.PoseLandmark.LEFT_FOOT_INDEX.value].y]
angle_left = calculate_angle(hip_left, knee_left, foot_left)
except:
pass

try:
hip_right= [landmarks[mp_pose.PoseLandmark.RIGHT_HIP.value].x,landmarks[mp_pose.PoseLandmark.RIGHT_HIP.value].y]
knee_right = [landmarks[mp_pose.PoseLandmark.RIGHT_KNEE.value].x,landmarks[mp_pose.PoseLandmark.RIGHT_KNEE.value].y]
foot_right = [landmarks[mp_pose.PoseLandmark.RIGHT_FOOT_INDEX.value].x,landmarks[mp_pose.PoseLandmark.RIGHT_FOOT_INDEX.value].y]
angle_right = calculate_angle(hip_right, knee_right, foot_right)

except:
pass


try:
shoulder_left = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x,landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
hip_left= [landmarks[mp_pose.PoseLandmark.LEFT_HIP.value].x,landmarks[mp_pose.PoseLandmark.LEFT_HIP.value].y]
foot_left = [landmarks[mp_pose.PoseLandmark.LEFT_FOOT_INDEX.value].x,landmarks[mp_pose.PoseLandmark.LEFT_FOOT_INDEX.value].y]
angle_downward = calculate_angle(shoulder_left, hip_left, foot_left)


except:
pass

except:
pass

 


For my use case, the angles between these points were pretty much enough to predict the pose.


Now, Let's see how we can use angles to detect the pose in the image.


if int(angle_downward) < 90:
mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
)


ans = str(int(angle_downward))+" "+ "degrees"
cv2.putText(image, ans,
tuple(np.multiply(knee_right, [width,height]).astype(int)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7,(80,80,80), 2, cv2.LINE_AA
)
cv2.rectangle(image, (0,0), (300, 50), (0, 0, 0), -1)

# Display Class
cv2.putText(image, 'Downward-Facing Dog Pose', (8,30), cv2.FONT_HERSHEY_SIMPLEX, 0.6,(192, 192, 192),2,cv2.LINE_AA)
output_file = output_directory + "image.jpg"
cv2.imwrite(output_file,image)
return output_file


# print(angle_left)
elif int(angle_right)<90 or int(angle_left)<90:

if int(angle_right)<90:
mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
)
ans = str(int(angle_right))+" "+"degrees"
cv2.putText(image, ans,
tuple(np.multiply(knee_right, [width,height]).astype(int)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7,(80,80,80), 2, cv2.LINE_AA
)

cv2.rectangle(image, (0,0), (240, 50), (0, 0, 0), -1)

# Display Class
cv2.putText(image, 'Tree Pose'
, (8,30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (192, 192, 192), 2,cv2.LINE_AA)
output_file = output_directory + "image.jpg"
cv2.imwrite(output_file,image)
return output_file

elif int(angle_left)<90:
mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
)
ans = str(int(angle_left))+" "+"degrees"
cv2.putText(image, ans,
tuple(np.multiply(knee_left, [width,height]).astype(int)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (80,80,80), 2, cv2.LINE_AA
)
cv2.rectangle(image, (0,0), (240, 50), (0, 0, 0), -1)

# Display Class
cv2.putText(image, 'Tree Pose'
, (8,30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (192, 192, 192), 2,cv2.LINE_AA)
output_file = output_directory + "image.jpg"
cv2.imwrite(output_file,image)
return output_file

elif (int(angle_right)>170 or int(angle_left)>170) and int(angle_downward)>100 :

mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
)
ans= str(int(angle_downward))+" "+"degrees"
cv2.putText(image, ans,
tuple(np.multiply(knee_right, [width,height]).astype(int)),
cv2.FONT_HERSHEY_SIMPLEX, 0.75, (80,80,80), 2, cv2.LINE_AA
)
cv2.rectangle(image, (0,0), (240, 50), (0, 0, 0), -1)

# Display Class
cv2.putText(image, 'Mountain Pose'
, (8,30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (192, 192, 192), 2,cv2.LINE_AA)
output_file = output_directory + "image.jpg"
cv2.imwrite(output_file,image)
return output_file

 


Focus on the If and Elif conditions because that is pretty much the "detection" part here.


The main challenge that I had faced here was getting the angles accurate in the conditions.


Future Work


We can train this model to help people in getting their form correct by assessing them through an app.


                                                           


 


0 comments