python opencv computer-vision euler-angles opencv-solvepnp

Strange behavior in Head pose estimation algorithm when face is moved away from center of image

I am trying to perform head pose estimation (determine the yaw, pitch, and roll of a face image). I first do face and landmark detection to obtain the 2D face landmark coordinates. Using these coordinates, along with 3D reference face landmark coordinates, I use OpenCVs PnP algorithm, Rodrigues algorithm, and then decomposeProjectionMatrix algorithm to get the euler angles.

Given the facial landmarks, a sample implementation might look like this:

cap = cv2.VideoCapture(0)
cam_w = int(cap.get(3))
cam_h = int(cap.get(4))
c_x = cam_w / 2
c_y = cam_h / 2
f_x = c_x / np.tan(60/2 * np.pi / 180)
f_y = f_x

#Estimated camera matrix values.
cam_matrix = np.float32([[f_x, 0.0, c_x],
                               [0.0, f_y, c_y], 
                               [0.0, 0.0, 1.0] ])

camera_distortion = np.float32([0.0, 0.0, 0.0, 0.0, 0.0])

object_pts = np.float32([[6.825897, 6.760612, 4.402142],
                         [1.330353, 7.122144, 6.903745],
                         [-1.330353, 7.122144, 6.903745],
                         [-6.825897, 6.760612, 4.402142],
                         [5.311432, 5.485328, 3.987654],
                         [1.789930, 5.393625, 4.413414],
                         [-1.789930, 5.393625, 4.413414],
                         [-5.311432, 5.485328, 3.987654],
                         [2.005628, 1.409845, 6.165652],
                         [-2.005628, 1.409845, 6.165652],
                         [2.774015, -2.080775, 5.048531],
                         [-2.774015, -2.080775, 5.048531],
                         [0.000000, -3.116408, 6.097667],
                         [0.000000, -7.415691, 4.070434]])

reprojectsrc = np.float32([[10.0, 10.0, 10.0],
                           [10.0, 10.0, -10.0],
                           [10.0, -10.0, -10.0],
                           [10.0, -10.0, 10.0],
                           [-10.0, 10.0, 10.0],
                           [-10.0, 10.0, -10.0],
                           [-10.0, -10.0, -10.0],
                           [-10.0, -10.0, 10.0]])

def get_head_pose(landmarks):
    image_pts = np.float32([ [landmarks[43].x, landmarks[43].y],
                             [landmarks[50].x, landmarks[50].y],
                             [landmarks[102].x, landmarks[102].y],
                             [landmarks[101].x, landmarks[101].y],
                             [landmarks[35].x, landmarks[35].y],
                             [landmarks[39].x, landmarks[39].y],
                             [landmarks[89].x, landmarks[89].y],
                             [landmarks[93].x, landmarks[93].y],
                             [landmarks[78].x, landmarks[78].y],
                             [landmarks[84].x, landmarks[84].y],
                             [landmarks[52].x, landmarks[52].y],
                             [landmarks[61].x, landmarks[61].y],
                             [landmarks[53].x, landmarks[53].y],
                             [landmarks[0].x, landmarks[0].y]])

    _, rotation_vec, translation_vec = cv2.solvePnP(object_pts, image_pts, cam_matrix, camera_distortion)

    reprojectdst, _ = cv2.projectPoints(reprojectsrc, rotation_vec, translation_vec, cam_matrix,
                                        camera_distortion)

    reprojectdst = tuple(map(tuple, reprojectdst.reshape(8, 2)))

    # calc euler angle
    rotation_mat, _ = cv2.Rodrigues(rotation_vec)
    pose_mat = cv2.hconcat((rotation_mat, translation_vec))
    _, _, _, _, _, _, euler_angle = cv2.decomposeProjectionMatrix(pose_mat)

    return reprojectdst, euler_angle

As I need this solution to be generic for all cameras, I use an approximation for the focal length using the image dimensions based on some code I found in the deepgaze repository. I also set the distortion coefficients to zero.

This solution seems to work fairly well, and in particular when the face is in the center of the image.

However, I ran a quick experiment in which I had the camera face a wall, and then I moved a face image up and down the flat wall. What I noticed is that as the face moves away from the center of the image, the computed pitch angle is no longer zero. Why is this? How can I adjust for this? Or is this actually the desired behavior?

Is this because the incidence angle from the face to the lens is no longer perpendicular to both (and therefore it's the incidence angle that's being reported)? Or does this have something to do with radial distortion? Or what could be at play here.

Edit: A minimal reproducible example can be found here. That being said, I think the issue has less to do with a bug in my code, as I have tested several implementations which exhibit this behavior. I am morose wondering about the theory of why this happens.

The implementation seems to be working, see the following:

Solution

This could be due to the distortion in the image. It's a good practice to calibrate your camera to estimate any distortion coefficients and use them for any kind of pose estimation.

For standard FoV cameras (FoV < 80), you can use the default brown model.
For high FoV cameras (80 < FOV < 130), you may want to try the rational model that has additional distortion parameters (check flags for this function)
For super high FoV cameras, (FoV > 130), I believe there's a fisheye model

Looking at the edges of your windows, it does seem that there is significant barrel distortion in your case.