Search code examples
video-intelligence-api

Face detection model returns empty dict (Google Cloud Video Intelligence)


I'm having issues with the face detection model from the Google Video Intelligence API.

I'm using Python 3.6.5, and google-cloud-videointelligence==1.15.0.

Occasionally I will receive a mangled response from the face detection model. I am parsing the response from the API by converting it into a dictionary using google.protobuf.json_format.MessageToDict(). I expect one of two behaviours to occur:

A. If faces are present in the video, I expect the results to be under the key 'FaceDetectionAnnotations', and take the form of a dictionary of dictionaries; with the keys of the outer dictionary being the 'segment number' (an integer), and the inner dictionaries looking something like this:

{'coordinates': {'left': 0.3432,
   'top': 0.075,
   'right': 0.6667,
   'bottom': 0.7435},
  'labels': {'confidence': 1.0,
   'attributes': [{'name': 'glasses', 'confidence': 0.041921083},
    {'name': 'headwear', 'confidence': 0.10601594},
    {'name': 'eyes_visible', 'confidence': 0.9976739},
    {'name': 'mouth_open', 'confidence': 0.005100015},
    {'name': 'looking_at_camera', 'confidence': 0.9647807},
    {'name': 'smiling', 'confidence': 0.017670842}]}}

B. If faces are not present in the video, I expect there to be no such 'FaceDetectionAnnotations' key anywhere in the results.

However, occasionally I am seeing a third kind of response, where the 'FaceDetectionAnnotations' key is present in the results (suggesting that the face detection model did in fact detect faces), however each of the inner dictionaries is completely empty. There is still one inner dictionary for each segment, but they contain none of the usual information, such as the start- and end-times of the segments, or any coordinates or confidence values.

I am only seeing this problem for videos that have faces in them.

I can confirm that this problem is present in the raw response from Google VI (before it is parsed with the MessageToDict() function, and I'm not sure what is causing it. Below is a link to an example video that exhibits this problem.

https://drive.google.com/file/d/1gsbe20iWp6lD9dH0PNvxvvQFUeB5F_cz/view?usp=sharing

If anyone has seen anything like this before, or has any idea how to fix this, I would greatly appreciate it.


Solution

  • Currently, there is an open issue regarding your concern, here. There engineering team is looking into it, you can keep track of its progress by following the thread linked above.