I was building a system that processes poses from videos using Python, and then, a Javascript (react) application that estimates the user pose on webcam in real time, and compares it with the Python processed poses.
The thing is that I started encountering very different results on the coordinates... I made a test running the same video on both applications, and it gives a very discrepant result. I've tried to seek for some patter to transform the data (sometimes the X axis in python seems to be the Y axis in javascript, and vice-versa), but testing more than one scenario, I just couldn't get a reliable pattern to transform and match the data.
I'm using the same version of mediapipe in both applications. I know that python and javascript mediapipe implementation can be slightly different... but it is that different or am I missing something?
Thank you!
For those who someday struggle with the same thing, here is what I found as a solution:
the mediapipe blazepose estimatePose
method, is analog to the pose_world_landmarks
from python's library, and not the pose_landmarks
method! The difference is described here: https://github.com/google/mediapipe/blob/master/docs/solutions/pose.md