Search code examples
linuxffmpegframe-rateaffdex-sdk

Affectiva drops every second frame


I am running Affectiva SDK 4.0 on a GoPro video recording. I'm using a C++ program on Ubuntu 16.04. The GoPro video was recorded with 60 fps. The problem is that Affectiva only provides results for around half of the frames (i.e. 30 fps). If I look at the timestamps provided by Affectiva, the last timestamp matches the video duration, that means Affectiva somehow skips around every second frame.

Before running Affectiva I was running ffmpeg with the following command to make sure that the video has a constant frame rate of 60 fps:

ffmpeg -i in.MP4 -vf -y -vcodec libx264 -preset medium -r 60 -map_metadata 0:g -strict -2 out.MP4 </dev/null 2>&1

When I inspect the presentation timestamp using ffprobe -show_entries frame=pict_type,pkt_pts_time -of csv -select_streams v in.MP4 I'm getting for the raw video the following values:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/media/GoPro_concat/GoPro_concat.MP4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 01:14:46.75, start: 0.000000, bitrate: 15123 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720 [SAR 1:1 DAR 16:9], 14983 kb/s, 59.94 fps, 59.94 tbr, 60k tbn, 119.88 tbc (default)
    Metadata:
      handler_name    :  GoPro AVC
      timecode        : 13:17:26:44
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      handler_name    :  GoPro AAC
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74)
    Metadata:
      handler_name    :  GoPro AVC
      timecode        : 13:17:26:44
Unsupported codec with id 0 for input stream 2
frame,0.000000,I
frame,0.016683,P
frame,0.033367,P
frame,0.050050,P
frame,0.066733,P
frame,0.083417,P
frame,0.100100,P
frame,0.116783,P
frame,0.133467,I
frame,0.150150,P
frame,0.166833,P
frame,0.183517,P
frame,0.200200,P
frame,0.216883,P
frame,0.233567,P
frame,0.250250,P
frame,0.266933,I
frame,0.283617,P
frame,0.300300,P
frame,0.316983,P
frame,0.333667,P
frame,0.350350,P
frame,0.367033,P
frame,0.383717,P
frame,0.400400,I
frame,0.417083,P
frame,0.433767,P
frame,0.450450,P
frame,0.467133,P
frame,0.483817,P
frame,0.500500,P
frame,0.517183,P
frame,0.533867,I
frame,0.550550,P
frame,0.567233,P
frame,0.583917,P
frame,0.600600,P
frame,0.617283,P
frame,0.633967,P
frame,0.650650,P
frame,0.667333,I
frame,0.684017,P
frame,0.700700,P
frame,0.717383,P
frame,0.734067,P
frame,0.750750,P
frame,0.767433,P
frame,0.784117,P
frame,0.800800,I
frame,0.817483,P
frame,0.834167,P
frame,0.850850,P
frame,0.867533,P
frame,0.884217,P
frame,0.900900,P
frame,0.917583,P
frame,0.934267,I
frame,0.950950,P
frame,0.967633,P
frame,0.984317,P
frame,1.001000,P
frame,1.017683,P
frame,1.034367,P
frame,1.051050,P
frame,1.067733,I
...

I have uploaded the full output on OneDrive.

If I run Affectiva on the raw video (not processed by ffmpeg) I face the same problem of dropped frames. I was using Affectiva with affdex::VideoDetector detector(60);

Is there a problem with the ffmpeg command or with Affectiva?

Edit: I think I have found out where the problem could be. It seems that Affectiva is not processing the whole video but just stops after a certain amount of processed frames without any error message. Below I have posted the C++ code I'm using. In the onProcessingFinished() method I'm printing something to the console when the processing is finished. But this message is never printed, so Affectiva never comes to the end.

Is there something wrong with my code or should I encode the videos into another format than MP4?

#include "VideoDetector.h"
#include "FrameDetector.h"

#include <iostream>
#include <fstream>
#include <mutex>
#include <condition_variable>

std::mutex m;
std::condition_variable conditional_variable;
bool processed = false;

class Listener : public affdex::ImageListener {
public:
    Listener(std::ofstream * fout) {
        this->fout = fout;
  }
  virtual void onImageCapture(affdex::Frame image){
      //std::cout << "called";
  }
  virtual void onImageResults(std::map<affdex::FaceId, affdex::Face> faces, affdex::Frame image){
      //std::cout << faces.size() << " faces detected:" << std::endl;

      for(auto& kv : faces){
        (*this->fout) << image.getTimestamp() << ",";
        (*this->fout) << kv.first << ",";
        (*this->fout) << kv.second.emotions.joy << ",";
        (*this->fout) << kv.second.emotions.fear << ",";
        (*this->fout) << kv.second.emotions.disgust << ",";
        (*this->fout) << kv.second.emotions.sadness << ",";
        (*this->fout) << kv.second.emotions.anger << ",";
        (*this->fout) << kv.second.emotions.surprise << ",";
        (*this->fout) << kv.second.emotions.contempt << ",";
        (*this->fout) << kv.second.emotions.valence << ",";
        (*this->fout) << kv.second.emotions.engagement << ",";
        (*this->fout) << kv.second.measurements.orientation.pitch << ",";
        (*this->fout) << kv.second.measurements.orientation.yaw << ",";
        (*this->fout) << kv.second.measurements.orientation.roll << ",";
        (*this->fout) << kv.second.faceQuality.brightness << std::endl;


        //std::cout <<  kv.second.emotions.fear << std::endl;
        //std::cout <<  kv.second.emotions.surprise  << std::endl;
        //std::cout <<  (int) kv.second.emojis.dominantEmoji;
      }
  }
private:
    std::ofstream * fout;
};

class ProcessListener : public affdex::ProcessStatusListener{
public:
    virtual void onProcessingException (affdex::AffdexException ex){
        std::cerr << "[Error] " << ex.getExceptionMessage();
    }
    virtual void onProcessingFinished (){
        {
            std::lock_guard<std::mutex> lk(m);
            processed = true;
            std::cout << "[Affectiva] Video processing finised." << std::endl;
        }
        conditional_variable.notify_one();
    }
};

int main(int argc, char ** argsv)
{
    affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES);
    //affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::LARGE_FACES);
    std::string classifierPath="/home/wrafael/affdex-sdk/data";
    detector.setClassifierPath(classifierPath);
    detector.setDetectAllEmotions(true);

    // Output
    std::ofstream fout(argsv[2]);
    fout << "timestamp" << ",";
    fout << "faceId" << ",";
    fout << "joy" << ",";
    fout << "fear" << ",";
    fout << "disgust" << ",";
    fout << "sadness" << ",";
    fout << "anger" << ",";
    fout << "surprise" << ",";
    fout << "contempt" << ",";
    fout << "valence" << ",";
    fout << "engagement"  << ",";
    fout << "pitch" << ",";
    fout << "yaw" << ",";
    fout << "roll" << ",";
    fout << "brightness" << std::endl;

    Listener l(&fout);
    ProcessListener pl;
    detector.setImageListener(&l);
    detector.setProcessStatusListener(&pl);

    detector.start();
    detector.process(argsv[1]);

    // wait for the worker
    {
    std::unique_lock<std::mutex> lk(m);
    conditional_variable.wait(lk, []{return processed;});
    }
    fout.flush();
    fout.close();
}

Edit 2: I have now digged further into the problem and looked only at one GoPro file with a duration of 19min 53s (GoPro splits the recordings). When I run Affectiva with affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES); on that raw video the following file is produced. Affectiva stops after 906s without any error message and without printing "[Affectiva] Video processing finised".

When I now transform the video using ffmpeg -i raw.MP4 -y -vcodec libx264 -preset medium -r 60 -map_metadata 0:g -strict -2 out.MP4 and then run Affectiva with affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES);, Affectiva runs until the end and prints "[Affectiva] Video processing finised" but the frame rate is only at 23 fps. Here is the file.

When I now run Affectiva with affdex::VideoDetector detector(62, 1, affdex::FaceDetectorMode::SMALL_FACES); on this transformed file, Affectiva stops after 509s and "[Affectiva] Video processing finised" is not printed. Here is the file.


Solution

  • If the video frame rate is 60 use a number higher than 60 to process all frames. IIRC if you just use 61 or 62 you should get the correct number of frames.