Search code examples
ffmpegvideo-streaminglibavlibavcodeclibavformat

libavcodec initialization to achieve real time playback with frame dropping when necessary


I have a C++ computer vision application linking with the ffmpeg libraries that provides frames from video streams to analysis routines. The idea being one can provide a moderately generic video stream identifier, and that video source will be decompressed and passed frame after frame to an analysis routine (which runs the user's analysis functions.) The "moderately generic video identifier" covers 3 generic video stream types: paths to video files on disk, IP video streams (cameras or video streaming services), and USB webcam pins with desired format & rate.

My current video player is generic as possible: video only, ignoring audio and other streams. It has a switch case for retrieving a stream's frame rate based upon the stream's source and codec, which is used to estimate the delay between decompressing frames. I've had many issues with trying to get reliable timestamps from the streams, so I am currently ignoring pts and dts. I know ignoring pts/dts is bad for variable frame rate streams. I plan to special case them later. The player currently checks to see if the last decompressed frame is more than 2 frames late (assuming a constant frame rate), and if so "drops the frame" - does not pass it to the user's analysis routine.

Essentially, the video player's logic is determining when to skip frames (not pass them to the time consuming analysis routine) so the analysis is fed video frames in as close as possible to real time.

I am looking for examples or discussions how one can initialize and/or maintain their AVFormatContext, AVStream, and AVCodecContext using (presumably but not limited to) AVDictionary options such that frame dropping as is necessary to maintain real time is performed at the libav libraries level, and not at my video player level. If achieving this requires separate AVDictionaies (or more) for each stream type and codec, then so be it. I am interested in understanding the pros and cons of both approachs: dropping frames at the player level or at the libav level.

(When some analysis requires every frame, the existing player implementation with frame dropping disabled is fine. I suspect if I can get frame dropping to occur at the libav level, I'll save the packet to frame decompression time as well, reducing the processing more than my current version.)


Solution

  • if I can get frame dropping to occur at the libav level, I'll save the packet to frame decompression time as well

    No you won't, unless you are willing to drop all frames til the next key frame. On typical mp4 video, this could easily be few seconds.

    You can skip colorspace conversion and resize, but often these are taken care of by the player.