Search code examples
pythonmediapipepose-detection

Mediapipe pose_landmarks out of range [0, 1]


It is written in the documentation of Mediapipe that: "x and y: Landmark coordinates normalized to [0.0, 1.0] by the image width and height respectively.", however I'm getting values out of that range.

mediapip 0.10.1, Python 3.8.10

#!/usr/bin/env python3

import numpy as np
import cv2
import mediapipe as mp
import time

class HumanPoseDetection:
    def __init__(self):
        # TODO: change the path
        model_path = "/home/user/models/pose_landmarker_full.task"
        BaseOptions = mp.tasks.BaseOptions
        self.PoseLandmarker = mp.tasks.vision.PoseLandmarker
        PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
        self.result = mp.tasks.vision.PoseLandmarkerResult
        VisionRunningMode = mp.tasks.vision.RunningMode       

        self.options = PoseLandmarkerOptions(
            base_options=BaseOptions(model_asset_path=model_path),
            running_mode=VisionRunningMode.LIVE_STREAM,
            result_callback=self.callback
            )
        
    def callback(self, result, output_image, timestamp_ms):
        if(result.pose_landmarks):
            self.result = result.pose_landmarks[0]
            for idx, elem in enumerate(self.result):
                if(0 <= elem.x <= 1 and 0 <= elem.y <= 1):
                    pass
                else:
                    print("Warning out of range values: {}".format(elem))

    def detect_pose(self):
        cap = cv2.VideoCapture(0)
        with self.PoseLandmarker.create_from_options(self.options) as landmarker:
            while cap.isOpened():
                _, image = cap.read()
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                image = cv2.resize(image, (224, 224)) 
                mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
                frame_timestamp_ms = int(time.time() * 1000)
                landmarker.detect_async(mp_image, frame_timestamp_ms)

if __name__=="__main__":
    HPD_ = HumanPoseDetection()
    HPD_.detect_pose()

A workaround proposed here is to use min, in my case I need the normalized x, y and not the pixel coordinates! also this workaround doesn't seem to be accurate!

x_px = min(math.floor(normalized_x * image_width), image_width - 1)
y_px = min(math.floor(normalized_y * image_height), image_height - 1)

Can you please tell me how can I solve this issue please? thanks in advance.

Related Issue


Solution

  • The coordinates from pose estimation will be outside the range [0,1] if the estimated position of the keypoint is off-screen. e.g., if I put my hand below the webcam field of view, the y coordinate will be greater than 1.

    This is because the coordinates are normalized to the image height and width, but the pose estimator still provides estimates for keypoints it can't see.

    As the visibility of an off-screen keypoint should be low, you could filter out these keypoints by upping your visibility threshold when you create the pose estimator.

    According to the example here (https://github.com/googlesamples/mediapipe/blob/main/examples/pose_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Pose_Landmarker.ipynb), you should be able to modify your code as follows to add min_pose_detection_confidence:

    self.options = PoseLandmarkerOptions(
                base_options=BaseOptions(model_asset_path=model_path),
                running_mode=VisionRunningMode.LIVE_STREAM,
                min_pose_detection_confidence=0.5,
                result_callback=self.callback
                )
    

    I have used 0.5, or 50%, as an example. Your results may be better with a different threshold. See min_pose_detection_confidence from the documentation: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker/python#live-stream


    Alternatively, if you don't mind having keypoints detected while they're off-screen, there may be no problem having them returned by the pose estimation. Just treat them as off-screen.