I am trying to estimate the head pose of single images mostly following this guide: https://towardsdatascience.com/real-time-head-pose-estimation-in-python-e52db1bc606a
The detection of the face works fine - if i plot the image and the detected landmarks they line up nicely.
I am estimating the camera matrix from the image, and assume no lens distortion:
size = image.shape
focal_length = size[1]
center = (size[1]/2, size[0]/2)
camera_matrix = np.array([[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]], dtype="double")
dist_coeffs = np.zeros((4, 1)) # Assuming no lens distortion
I am trying to get the head pose by matching points in the image with points in the 3D model using solvePNP:
# 3D-model points to which the points extracted from an image are matched:
model_points = np.array([
(0.0, 0.0, 0.0), # Nose tip
(0.0, -330.0, -65.0), # Chin
(-225.0, 170.0, -135.0), # Left eye corner
(225.0, 170.0, -135.0), # Right eye corner
(-150.0, -150.0, -125.0), # Left Mouth corner
(150.0, -150.0, -125.0) # Right mouth corner
])
image_points = np.array([
shape[30], # Nose tip
shape[8], # Chin
shape[36], # Left eye left corner
shape[45], # Right eye right corne
shape[48], # Left Mouth corner
shape[54] # Right mouth corner
], dtype="double")
success, rotation_vec, translation_vec) = \
cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs)
finally, I am getting the euler angles from the rotation:
rotation_mat, _ = cv2.Rodrigues(rotation_vec)
pose_mat = cv2.hconcat((rotation_mat, translation_vec))
_, _, _, _, _, _, angles = cv2.decomposeProjectionMatrix(pose_mat)
now the azimuth is what i would expect - it is negative if i look to the left, zero in the middle and positive to the right.
the elevation however is strange - if i look in the middle it has a constant value but the sign is random - changing from image to image (the value is around 170).
When i look up the sign is positive and the value decreases the more i look up, When i look down the sign is negative and the value decreases the more i look down.
Can someone explain this output to me?
Ok so it seems i have found a solution - the model points (which i have found in several blogs on the topic) seem to be wrong. The code seems to work with this combination of model and image points (no idea why it was trial and error):
model_points = np.float32([[6.825897, 6.760612, 4.402142],
[1.330353, 7.122144, 6.903745],
[-1.330353, 7.122144, 6.903745],
[-6.825897, 6.760612, 4.402142],
[5.311432, 5.485328, 3.987654],
[1.789930, 5.393625, 4.413414],
[-1.789930, 5.393625, 4.413414],
[-5.311432, 5.485328, 3.987654],
[2.005628, 1.409845, 6.165652],
[-2.005628, 1.409845, 6.165652],
[2.774015, -2.080775, 5.048531],
[-2.774015, -2.080775, 5.048531],
[0.000000, -3.116408, 6.097667],
[0.000000, -7.415691, 4.070434]])
image_points = np.float32([shape[17], shape[21], shape[22], shape[26],
shape[36], shape[39], shape[42], shape[45],
shape[31], shape[35], shape[48], shape[54],
shape[57], shape[8]])