python-3.x face-detection opencv pose-estimation python

How does Elevation of a Head Pose in Python-OpenCV work?

I am trying to estimate the head pose of single images mostly following this guide: https://towardsdatascience.com/real-time-head-pose-estimation-in-python-e52db1bc606a

The detection of the face works fine - if i plot the image and the detected landmarks they line up nicely.

I am estimating the camera matrix from the image, and assume no lens distortion:

    size = image.shape
    focal_length = size[1]
    center = (size[1]/2, size[0]/2)
    camera_matrix = np.array([[focal_length, 0, center[0]],
                             [0, focal_length, center[1]],
                             [0, 0, 1]], dtype="double")
    dist_coeffs = np.zeros((4, 1))  # Assuming no lens distortion

I am trying to get the head pose by matching points in the image with points in the 3D model using solvePNP:

    # 3D-model points to which the points extracted from an image are matched:
    model_points = np.array([
                                (0.0, 0.0, 0.0),             # Nose tip
                                (0.0, -330.0, -65.0),        # Chin
                                (-225.0, 170.0, -135.0),     # Left eye corner
                                (225.0, 170.0, -135.0),      # Right eye corner
                                (-150.0, -150.0, -125.0),    # Left Mouth corner
                                (150.0, -150.0, -125.0)      # Right mouth corner
                            ])
    
    image_points = np.array([
                            shape[30],     # Nose tip
                            shape[8],     # Chin
                            shape[36],     # Left eye left corner
                            shape[45],     # Right eye right corne
                            shape[48],     # Left Mouth corner
                            shape[54]      # Right mouth corner
                            ], dtype="double")
    
    success, rotation_vec, translation_vec) = \
            cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs)

finally, I am getting the euler angles from the rotation:

rotation_mat, _ = cv2.Rodrigues(rotation_vec)
pose_mat = cv2.hconcat((rotation_mat, translation_vec))
_, _, _, _, _, _, angles = cv2.decomposeProjectionMatrix(pose_mat)

now the azimuth is what i would expect - it is negative if i look to the left, zero in the middle and positive to the right.

the elevation however is strange - if i look in the middle it has a constant value but the sign is random - changing from image to image (the value is around 170).

When i look up the sign is positive and the value decreases the more i look up, When i look down the sign is negative and the value decreases the more i look down.

Can someone explain this output to me?

Solution

Ok so it seems i have found a solution - the model points (which i have found in several blogs on the topic) seem to be wrong. The code seems to work with this combination of model and image points (no idea why it was trial and error):

model_points = np.float32([[6.825897, 6.760612, 4.402142],
                         [1.330353, 7.122144, 6.903745],
                         [-1.330353, 7.122144, 6.903745],
                         [-6.825897, 6.760612, 4.402142],
                         [5.311432, 5.485328, 3.987654],
                         [1.789930, 5.393625, 4.413414],
                         [-1.789930, 5.393625, 4.413414],
                         [-5.311432, 5.485328, 3.987654],
                         [2.005628, 1.409845, 6.165652],
                         [-2.005628, 1.409845, 6.165652],
                         [2.774015, -2.080775, 5.048531],
                         [-2.774015, -2.080775, 5.048531],
                         [0.000000, -3.116408, 6.097667],
                         [0.000000, -7.415691, 4.070434]])

image_points = np.float32([shape[17], shape[21], shape[22], shape[26],
                           shape[36], shape[39], shape[42], shape[45],
                           shape[31], shape[35], shape[48], shape[54],
                           shape[57], shape[8]])