I want to transform the output of Google Vision API facial recognition into a feature set for a ML classifier. For each training instance I get a list of predicted faces which is represented as a list of dictionaries where the values are themselves dictionaries and the values of these 'value dictionaries' are categorical in nature like this:
$ faces[191:197]
[{'face_1': {'joy': 'VERY_UNLIKELY',
'surprise': 'UNLIKELY',
'anger': 'VERY_UNLIKELY',
'sorrow': 'VERY_UNLIKELY',
'headwear': 'VERY_UNLIKELY'}},
{},
{},
{'face_1': {'joy': 'VERY_LIKELY',
'surprise': 'LIKELY',
'anger': 'VERY_UNLIKELY',
'sorrow': 'VERY_UNLIKELY',
'headwear': 'VERY_UNLIKELY'},
'face_2': {'joy': 'VERY_UNLIKELY',
'surprise': 'VERY_UNLIKELY',
'anger': 'VERY_UNLIKELY',
'sorrow': 'VERY_UNLIKELY',
'headwear': 'VERY_LIKELY'}},
{'face_1': {'joy': 'VERY_LIKELY',
'surprise': 'VERY_UNLIKELY',
'anger': 'VERY_UNLIKELY',
'sorrow': 'VERY_UNLIKELY',
'headwear': 'VERY_UNLIKELY'},
'face_2': {'joy': 'POSSIBLE',
'surprise': 'VERY_UNLIKELY',
'anger': 'VERY_UNLIKELY',
'sorrow': 'VERY_UNLIKELY',
'headwear': 'VERY_UNLIKELY'}}]
My ambition is to transform this into a ML readable format. I would like to use an encoding that looks like this (n is the maximum number of predicted faces in the entire dataset):
joy_1 surprise_1 , ...., anger_n sorrow_n headwear_n
img_1 1 2 , ...., 0 0 0
img_2 0 0 , ...., 0 0 0
img_3 0 0 , ...., 0 0 0
img_4 5 4 , ...., 0 0 0
.
.
.
I have used sklearn dictVectorizer and labelEncoder for other features that were lists of dicts but those dicts didn't have dicts as values as is the case for this data source.
I don't know of anything that would work out-of-the-box that handles mapping ordinal values (VERY_UNLIKELY
, ..., VERY_LIKELY
) to integers in a user-defined way while also handling possible keys in dictionaries.
Something like the following would probably be easiest here:
# Include `images` list-of-dicts from question
# images = [{'face_1': {'joy': 'VERY_UNLIKELY',
# ...]
import numpy as np
observations = ["joy", "surprise", "anger", "sorrow", "headwear"]
levels = {
"VERY_UNLIKELY": 0,
"UNLIKELY": 1,
"POSSIBLE": 2,
"LIKELY": 3,
"VERY_LIKELY": 4,
}
N_IMAGES = len(images)
N_OBSERVATIONS = len(observations)
N_PEOPLE_PER_IMAGE = 2
vector = np.zeros((N_IMAGES, N_PEOPLE_PER_IMAGE * N_OBSERVATIONS))
for i, image in enumerate(images):
for j, face in enumerate(image):
if not face:
continue
else:
t = (j * N_OBSERVATIONS)
e = (j * N_OBSERVATIONS) + N_OBSERVATIONS
obs_vector = [levels[image[face][obs]] for obs in observations]
vector[i][t:e] = obs_vector
print(vector)
Result:
[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[4. 3. 0. 0. 0. 0. 0. 0. 0. 4.]
[4. 0. 0. 0. 0. 2. 0. 0. 0. 0.]]
If there are up to 8 faces in each image, this could easily be extended by setting N_PEOPLE_PER_IMAGE = 8
.