Search code examples
javascriptpythoncomputer-visionmediapipepose

Blazepose Mediapipe: Differences between Python and Javascript implementation


I was building a system that processes poses from videos using Python, and then, a Javascript (react) application that estimates the user pose on webcam in real time, and compares it with the Python processed poses.

The thing is that I started encountering very different results on the coordinates... I made a test running the same video on both applications, and it gives a very discrepant result. I've tried to seek for some patter to transform the data (sometimes the X axis in python seems to be the Y axis in javascript, and vice-versa), but testing more than one scenario, I just couldn't get a reliable pattern to transform and match the data.

I'm using the same version of mediapipe in both applications. I know that python and javascript mediapipe implementation can be slightly different... but it is that different or am I missing something?

Thank you!


Solution

  • For those who someday struggle with the same thing, here is what I found as a solution:

    the mediapipe blazepose estimatePose method, is analog to the pose_world_landmarks from python's library, and not the pose_landmarks method! The difference is described here: https://github.com/google/mediapipe/blob/master/docs/solutions/pose.md