Search code examples
c++opencvwebcam-capturedlib

Dlib webcam capture with face detection and shape prediction is slow


I am working on a program in C++ which should detect faces from webcam stream, than crop them using face landmarks and swap them.

I programmed face detection using OpenCV and Viola-Jones face detection. Works fine. Than I searched for how to segment just face from ROI. I tried few skin detection implementations but none was successful.

Than I found dlib face landmarks. I decided to try it. Just in beginning I faced problems because I had to convert cv::Mat to cv_image, Rect to rectangle etc. So I tried to do it just with dlib. I just get stream using cv::VideoCapture and than I wanted to show what is captured using dlib image_window. But here was the problem it was reeeealy slow. Down is used code. Commented lines are lines which do that same but using OpenCV. OpenCV is much more faster, smooth, continuous than code which is not commented whis is like 5 FPS. That's horrible. I can't imagine how slow it will be when I apply face detection and face landmarks.

Am I doing something wrong? How can I make it faster? Or should I use OpenCV for video capture and showing?

cv::VideoCapture cap;
image_window output_frame;

if (!cap.open(0))
{
    cout << "ERROR: Opening video device 0 FAILED." << endl;
    return -1;
}

cv::Mat cap_frame;
//HWND hwnd;
do
{
    cap >> cap_frame;

    if (!cap_frame.empty())
    {
        cv_image<bgr_pixel> dlib_frame(cap_frame);
        output_frame.set_image(dlib_frame);
        //cv::imshow("output",dlib::toMat(dlib_frame));
    }

    //if (27 == char(cv::waitKey(10)))
    //{
    //  return 0;
    //}

    //hwnd = FindWindowA(NULL, "output");
} while(!output_frame.is_closed())//while (hwnd != NULL);

EDIT: After switching to Release mode showing capured frames becomes fine. But I go on and tried to do face detection and shape prediction with dlib just like in example here http://dlib.net/face_landmark_detection_ex.cpp.html. It was quite laggy. So I turned off shape prediction. Still "laggy.

So I assumed face detection is slowing it down. So I tried face detection using OpenCV because it was significantly better than dlib detector. I needed to convert detected cv::Rect to dlib::rectangle. I used this.

std::vector<dlib::rectangle> dlib_rois;
long l, t, r, b;

for (int i = cv_rois.size() - 1; i >= 0; i--)
{
    l = cv_rois[i].x;
    t = cv_rois[i].y;
    r = cv_rois[i].x + cv_rois[i].width;
    b = cv_rois[i].y + cv_rois[i].height;
    dlib_rois.push_back(dlib::rectangle(l, t, r, b));
}

But this combination of OpenCV face detection and dlib shape prediction become brutal laggy. It takes about 4s to process single frame.

I can't figure out why. OpenCV face detection was absolutely fine, dlib shape prediction doesn't seem to be hard to process. Can somebody help me with?


Solution

  • You can take several actions to make Dlib run faster, before assuming that it is slow. You only have to read more documentation and try.

    • Dlib is capable of detecting faces in very small areas (80x80 pixels). You are probably sending raw WebCam frames at approximately 1280x720 resolution, which is not necessary. I recommend from my experience to reduce the frames about a quarter of the original resolution. Yes, 320x180 is fine for Dlib. In consequence you will get 4x speed.

    • As mentioned in the comments, by turning on the compilation optimizations while building Dlib, you will get significantly improvement in speed.

    • Dlib works faster with grayscale images. You do not need the color on the webcam frame. You can use OpenCV to convert into grayscale the previously reduced in size frame.

    • Dlib takes its time finding faces but is extremely fast finding landmarks on faces. Only if your Webcam provides a high framerate (24-30fps), you could skip some frames because faces normally doesn't move so much.

    Given that optimizations, I am confident you will get at least 12x faster detection.