Search code examples
c++opencvvideo-processingvideo-capturepreview

Too High CPU Footprint of OpenCV Text Overlay on FHD Video Stream


I want to display a FHD live-stream (25 fps) and overlay some (changing) text. For this I essentially use the code below.

Basically it is

  1. Load frame
  2. (cv::putText skipped here)
  3. Display frame if it's a multiple of delay

but the code is super super slow compared to e.g. mpv and consumes way to much cpu-time (cv::useOptimized() == true).

So far delay is my inconvenient fiddle-parameter to somehow make it feasible.

  • delay == 1 results in 180 % CPU usage (full frame-rate)
  • delay == 5 results in 80 % CPU usage

But delay == 5 or 5 fps is really sluggish and actually still too much cpu load.

How can I make this code faster or otherwise better or otherwise solve the task (I'm not bound to opencv)?

P.s. Without cv::imshow the CPU usage is less than 30 %, regardless of delay.

#include <opencv2/opencv.hpp>
#include <X11/Xlib.h>

// process ever delayth frame
#define delay 5

Display* disp = XOpenDisplay(NULL);
Screen*  scrn = DefaultScreenOfDisplay(disp);
int screen_height = scrn->height;
int screen_width  = scrn->width;

int main(int argc, char** argv){

  cv::VideoCapture cap("rtsp://url");
  cv::Mat frame;

  if (cap.isOpened())
    cap.read(frame);

  cv::namedWindow(  "PREVIEW", cv::WINDOW_NORMAL );
  cv::resizeWindow( "PREVIEW", screen_width, screen_height );

  int framecounter = 0;
  while (true){

    if (cap.isOpened()){

      cap.read(frame);
      framecounter += 1;

      // Display only delay'th frame
      if (framecounter % delay == 0){
        /*
         * cv::putText
         */
        framecounter = 0;
        cv::imshow("PREVIEW", frame);
      }

    }
    cv::waitKey(1);
  }
}

Solution

  • I now found out about valgrind (repository) and gprof2dot (pip3 install --user gprof2dot):

    valgrind --tool=callgrind /path/to/my/binary    # Produced file callgrind.out.157532
    gprof2dot --format=callgrind --output=out.dot callgrind.out.157532
    dot -Tpdf out.dot -o graph.pdf
    

    That produced a wonderful graph saying that over 60 % evaporates on cvResize. And indeed, when I comment out cv::resizeWindow, the cpu usage lowers from 180 % to ~ 60 %.

    Since the screen has a resolution of 1920 x 1200 and the stream 1920 x 1080, it essentially did nothing but burning CPU cycles.

    So far, this is still fragile. As soon as I switch it to full-screen mode and back, the cpu load goes back to 180 %.

    To fix this, it turned out that I can either disable resizing completely with cv::WINDOW_AUTOSIZE ...

    cv::namedWindow( "PREVIEW", cv::WINDOW_AUTOSIZE );
    

    ... or -- as Micka suggested -- on OpenCV versions compiled with OpenGL support (-DWITH_OPENGL=ON, my Debian repository version was not), use ...

        cv::namedWindow( "PREVIEW", cv::WINDOW_OPENGL );
    

    ... to offload the rendering to the GPU, what turns out to be even faster together with resizing (55 % CPU compared to 65 % for me). It just does not seem to work together with cv::WINDOW_KEEPRATIO.*

    Furthermore, it turns out that cv:UMat can be used as a drop-in replacement for cv:Mat which additionally boosts the performance (as seen by ps -e -o pcpu,args):

    Mat-UMat performance profile]


    Appendix

    [*] So we have to manually scale it and take care of the aspect ratio.

    float screen_aspratio = (float) screen_width / screen_height;
    float image_aspratio  = (float) image_width  / image_height;
    
    if ( image_aspratio >= screen_aspratio ) { // width limited, center window vertically
      cv::resizeWindow("PREVIEW", screen_width, screen_width / image_aspratio );
      cv::moveWindow(  "PREVIEW", 0, (screen_height - image_height) / 2 );
    }
    else { // height limited, center window horizontally
      cv::resizeWindow("PREVIEW", screen_height * image_aspratio, screen_height );
      cv::moveWindow(  "PREVIEW", (screen_width - image_width) / 2, 0 );
    }