python machine-learning scikit-learn feature-extraction feature-engineering

Convert list of dicts with dicts as values to ML features

I want to transform the output of Google Vision API facial recognition into a feature set for a ML classifier. For each training instance I get a list of predicted faces which is represented as a list of dictionaries where the values are themselves dictionaries and the values of these 'value dictionaries' are categorical in nature like this:

$ faces[191:197]


[{'face_1': {'joy': 'VERY_UNLIKELY',
   'surprise': 'UNLIKELY',
   'anger': 'VERY_UNLIKELY',
   'sorrow': 'VERY_UNLIKELY',
   'headwear': 'VERY_UNLIKELY'}},
 {},
 {},
 {'face_1': {'joy': 'VERY_LIKELY',
   'surprise': 'LIKELY',
   'anger': 'VERY_UNLIKELY',
   'sorrow': 'VERY_UNLIKELY',
   'headwear': 'VERY_UNLIKELY'},
  'face_2': {'joy': 'VERY_UNLIKELY',
   'surprise': 'VERY_UNLIKELY',
   'anger': 'VERY_UNLIKELY',
   'sorrow': 'VERY_UNLIKELY',
   'headwear': 'VERY_LIKELY'}},
 {'face_1': {'joy': 'VERY_LIKELY',
   'surprise': 'VERY_UNLIKELY',
   'anger': 'VERY_UNLIKELY',
   'sorrow': 'VERY_UNLIKELY',
   'headwear': 'VERY_UNLIKELY'},
  'face_2': {'joy': 'POSSIBLE',
   'surprise': 'VERY_UNLIKELY',
   'anger': 'VERY_UNLIKELY',
   'sorrow': 'VERY_UNLIKELY',
   'headwear': 'VERY_UNLIKELY'}}]

My ambition is to transform this into a ML readable format. I would like to use an encoding that looks like this (n is the maximum number of predicted faces in the entire dataset):

         joy_1  surprise_1 , ...., anger_n    sorrow_n    headwear_n
img_1      1       2       , ....,  0           0            0
img_2      0       0       , ....,  0           0            0
img_3      0       0       , ....,  0           0            0
img_4      5       4       , ....,  0           0            0
  .
  .
  .

I have used sklearn dictVectorizer and labelEncoder for other features that were lists of dicts but those dicts didn't have dicts as values as is the case for this data source.

Solution

I don't know of anything that would work out-of-the-box that handles mapping ordinal values (VERY_UNLIKELY, ..., VERY_LIKELY) to integers in a user-defined way while also handling possible keys in dictionaries.

Something like the following would probably be easiest here:

# Include `images` list-of-dicts from question

# images = [{'face_1': {'joy': 'VERY_UNLIKELY',
#            ...]

import numpy as np

observations = ["joy", "surprise", "anger", "sorrow", "headwear"]
levels = {
    "VERY_UNLIKELY": 0,
    "UNLIKELY": 1,
    "POSSIBLE": 2,
    "LIKELY": 3,
    "VERY_LIKELY": 4,
}

N_IMAGES = len(images)
N_OBSERVATIONS = len(observations)
N_PEOPLE_PER_IMAGE = 2

vector = np.zeros((N_IMAGES, N_PEOPLE_PER_IMAGE * N_OBSERVATIONS))

for i, image in enumerate(images):
    for j, face in enumerate(image):
        if not face:
            continue
        else:
            t = (j * N_OBSERVATIONS)
            e = (j * N_OBSERVATIONS) + N_OBSERVATIONS
            obs_vector = [levels[image[face][obs]] for obs in observations]
            vector[i][t:e] = obs_vector

print(vector)

Result:

[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [4. 3. 0. 0. 0. 0. 0. 0. 0. 4.]
 [4. 0. 0. 0. 0. 2. 0. 0. 0. 0.]]

If there are up to 8 faces in each image, this could easily be extended by setting N_PEOPLE_PER_IMAGE = 8.