Search code examples
ffmpegmp4rtsprtpmuxer

How to write a video stream containing B-frame and no DTS to a MP4 container?


I want to save a h264 video stream received from a RTSP source to a MP4 container. Not like other questions asked on SO, here the challenges I face are:

  • The stream contains B frames.

  • The stream has only PTS given by the RTP/RTCP.

Here is the code I did

//  ffmpeg
    pkt->data = ..;
    pkt->size = ..;
    pkt->flags = bKeyFrame? AV_PKT_FLAG_KEY : 0;    
    pkt->dts = AV_NOPTS_VALUE;
    pkt->pts = PTS;

    // PTS is based on epoch microseconds so I ignored re-scaling.
    //av_packet_rescale_ts(pkt, { 1, AV_TIME_BASE }, muxTimebase);

    auto ret = av_interleaved_write_frame(m_pAVFormatCtx, pkt);

I received a lot of error messages like this: "Application provided invalid, non monotonically increasing dts to muxer ...".

Result: the mp4 file is playable via VLC but the FPS is just a half of the original FPS and the video duration is incorrect (VLC shows a weird number).

So how do I set correct DTS and PTS before sending to the container?

Update: I have tried some changes, though not successfully yet, I found that the reason of the frame rate drop is due to the muxer discards frames having incorrect DTS. Additionally, if I set start of PTS and DTS value too big, some players like VLC has to delay some time before showing video.


Solution

  • I have done several experiments and have some things to share to you.

    1. Regardless having B-frames or not, mp4 muxer requires DTS must be (at least):

      • Monotonically increasing.
      • DTS <= PTS per each frame.
      • PTS and DTS should start from values close to zero (otherwise players like VLC has to delay some time before displaying video).
    2. If there is no B-frames in the stream, DTS could be copied from PTS and saving frames to a mp4 file without any issue.

    3. If there are B-frames in the stream, the story is total different. In this case, PTS of frames are not monotonically increased due to B-frames. Hence, just copying DTS = PTS definitely won't work. We have to find a way to have DTS by either sending DTS via out-of-band or calculating from FPS and PTS.

    For sending out-of-band, it is quite complicated because it requires handling both RTSP server and RTSP client. Here I just want to show the simple way of deducing DTS from FPS and PTS.

    Rough steps are like this:

    Detects average duration (or FPS) between frames

    • Parses FPS from SDP of receiving RTSP session. This way depends on support of RTSP server. Some support, others do not.  

    • Another way is to calculate average duration between frames from sequence of frames. You can buffer number of frames equal to size of one GOP, getting the PTS difference of the first and the last frame of the GOP divided by the number of frames you will have the average duration. Example, if the FPS is assumed 30, then calculated average duration should be approx 33,333 us.

    Saving to the container

    // Initialize container
    
        pAVStream->time_base = { 1, AV_TIME_BASE }; // PTS/DTS in microseconds.
        pAVFormatCtx->oformat->flags |= AVFMT_SEEK_TO_PTS;
        ret = avformat_write_header(m_pAVFormatCtx, &priv_opts);
    
        Assume that you have pre-calculated average duration: 
        nAvgDuration = 33'333LL;
    
        //  Per each frame
    
        if (waitingForTheFirstKeyFrame) {
            if (!bsKeyFrame) {
                return false;
            }
    
            waitingForTheFirstKeyFrame = false;
            nPTSOffset = nPTS; // pts will start from 0
            nStartDTS = nPTS - nAvgDuration; // dts will start from -nAvgDuration
        }
    
        nDTS = nStartDTS;
        nStartDTS += nAvgDuration; // dts is monotonically increasing
    
        pkt->pts = nPTS - nPTSOffset;
        pkt->dts = nDTS - nPTSOffset;
    
        //  Since PTS/DTS are in microseconds, no need to rescalling more.
        //  Of course, you can use a different time_base.
    
        auto ret = av_interleaved_write_frame(m_pAVFormatCtx, pkt);
    

    Caution:

    This solution works well with an assumption that the original PTS of the stream (at server side) are monotonically increased, there are no gaps between frames and there is no frame loss. Otherwise, the accuracy of DTS may be reduced or even the mp4 file could not be played.