c++multithreading concurrency predicate condition-variable

Not using predicate in condition variable causing slowdown

I am trying to build a camera wrapper that allows other threads to retrieve a frame, while the wrapper handles all the processing in a separate thread. The main purpose of using a condition variable to notify other threads of a new frame being ready is so that other threads do not get the same frame twice. In a way, this is a case of the producer-consumer problem.

During initialization, I initialize a thread to do to the capture and processing:

int PiCamera::Init(){
    camera_on_ = true;
    capture_thread_ = std::thread(&PiCamera::RetrieveFrames, this);
    return 0;
}

where RetrieveFrames is:

int PiCamera::RetrieveFrames1(){
while(camera_on_){
    camera_.grab();
    frame_ready_ = true;
    ready_condition_.notify_all(); // Notify on condition variable
}
return 0;
}

Now when there is a single thread trying to get a frame, all the thread needs to call is:

int PiCamera::GetFrame1(cv::Mat &image){
    // Lock mutex
    std::unique_lock<std::mutex> mutex_lock(ready_mutex_);
    ready_condition_.wait(mutex_lock, [this](){return frame_ready_;});
    camera_.retrieve(image);
    frame_ready_ = false;
    // Unlock mutex
    mutex_lock.unlock();
    return 0;
}

Now, if two threads call the function GetFrame, each one of them will only be able to get alernate frames. However, I want any number of incoming threads to be able to get the latest frame as soon as it is available.

Here, it seems like this is like a producer-consumer problem with multiple consumers, but all the consumers should be able to get the latest data available, and should not read the same data twice.

Hence, I made the following changes:

int PiCamera::RetrieveFrames2(){
    while(camera_on_){
        camera_.grab();
        // frame_ready_ = true;
        ready_condition_.notify_all(); // Notify on condition variable
    }
    return 0;
}

int PiCamera::GetFrame2(cv::Mat &image){
    // Lock mutex
    std::unique_lock<std::mutex> mutex_lock(ready_mutex_);
    // ready_condition_.wait(mutex_lock, [this](){return frame_ready_;});
    ready_condition_.wait(mutex_lock);
    camera_.retrieve(image);
    // frame_ready_ = false;
    // Unlock mutex
    mutex_lock.unlock();
    return 0;
}

Now I could use 2 threads to get the same frame, but I noticed some slowdowns in retrieving frames.

The program I ran was kinda like this:

PiCamera camera();
camera.Init();
cv::Point2f centroid_location;
cv::Mat image;
float time1[NFRAMES] = {};
float time2[NFRAMES] = {};
float time3[NFRAMES] = {};
timeval tstart, tend, t1, t2, t3;

for(int frame=0;frame<NFRAMES;frame++){
    gettimeofday(&t1, nullptr);
    camera->GetFrame(image);
    gettimeofday(&t2, nullptr);
    time1[frame] = ElapsedSec(t1, t2)*1000;
    GetCentroid(image, centroid_location);
    // Just to increase workload
    GetCentroid(image, centroid_location);
    gettimeofday(&t3, nullptr);
    time2[frame] = ElapsedSec(t2, t3)*1000;
}

gettimeofday(&tend, nullptr);
float total_time = ElapsedSec(tstart, tend);
float fps = (float)NFRAMES/total_time;
std::cout << "Camera took " << total_time << " seconds at " << fps << " FPS\n";
std::cout << "t1 " << ArrayMean(time1, NFRAMES) << " t2 " << ArrayMean(time2, NFRAMES) << '\n';

The camera is able to grab at a framerate of 120FPS, So I was hoping I would be able to process the frames at 120FPS as well.

When I run the program using RetrieveFrames1 and GetFrames1, I get this:

Camera took 8.39579 seconds at 119.107 FPS
t1 0.367703 t2 8.02553

However, when I do this test using RetrieveFrames2 and GetFrames2:

Camera took 13.7088 seconds at 72.946 FPS
t1 4.74955 t2 8.95662

Even if i call GetCentroid Just once, I get the following results:

Camera took 8.29365 seconds at 120.574 FPS
t1 3.98624 t2 4.30591

and

Camera took 8.94322 seconds at 111.817 FPS
t1 4.43984 t2 4.50159

Why are my threads taking so much longer to wait on the condition variable here when only thing I have done is to remove the predicate?

Solution

In case anyone comes across this, I ended up using https://github.com/rigtorp/MPMCQueue to abstract away the logic of transferring the data between the consumer and producer threads