Search code examples
androidvideogstreamerqgroundcontrol

GStreamer mp4mux gives "Buffer has no PTS" error using custom appsrc


I have a pipeline coded in C++ that looks like this:

appsrc do-timestamp=TRUE is-live=TRUE caps=
“video/x-h264, stream-format=(string)byte-stream, alignment=(string)none, framerate=(fraction)0/1” min-latency=300000000 ! h264parse ! video/x-h264, stream-format=(string)avc, alignment=(string)au ! tee name=t \
t. ! queue ! valve drop=FALSE ! decodebin ! glupload ! glcolorconvert ! qtsink sync=FALSE \
t. ! queue ! valve drop=FALSE ! mp4mux reserved-max-duration=3600000000000 reserved-moov-update-period=10000000000 ! filesink sync=FALSE location=”....../out.mp4”

appsrc injects the video coming from a drone’s USB wireless video receiver into the pipeline.

Some more context:

  • The USB receiver hardware gives us 512-byte chunks of non-timestamped raw Annex-B h.264 video
  • The framerate should be 60 fps, but in practice it rarely keeps up with it and varies depending on the signal strength (therefore framerate=(fraction)0/1”, and that’s the reason neither qtsink nor filesink are sync’d to the pipeline (sync=FALSE))
  • The hardware introduces a minimum of 300 ms of latency, as set in appsrc
  • appsrc is automatically timestamping my buffers (do-timestamp=TRUE)
  • I’m using mp4mux reserved-max-duration and reserved-moov-update-period to prevent app crashes from breaking the mp4 files
  • I’m using GStreamer 1.18.4 for Android

Video recording works fine when the drone is not airborne. But when it takes off, after about 15 seconds of correct video recording the mp4mux element fails with the message “Buffer has no PTS”. Unfortunately, this has been consistently reported by some users but I can’t reproduce it (as it requires flying a drone that I don’t have), which doesn’t make a lot of sense. My guess so far is that at that particular moment there is probably some congestion in the wireless video link and some video packets might be held up for a few msecs, and that might be causing some trouble.

Here’s the (simplified) code that creates appsrc

   _pAppSrc = gst_element_factory_make("appsrc", "artosyn_source");
    gpointer pAppSrc = static_cast<gpointer>(_pAppSrc);

    // Retain one more ref, so the source is destroyed
    // in a controlled way
    gst_object_ref(_pAppSrc);

    pCaps = gst_caps_from_string("video/x-h264, stream-format=(string)byte-stream, alignment=none, framerate=(fraction)0/1"));
    g_object_set(G_OBJECT(pAppSrc), "caps", pCaps,
                                    "is-live", TRUE,
                                    "min-latency", G_GINT64_CONSTANT(300000000),
                                    "format", GST_FORMAT_TIME,
                                    "do-timestamp", TRUE,
                                    nullptr);

   _pBufferPool = gst_buffer_pool_new();

   pConfig = gst_buffer_pool_get_config (_pBufferPool);

   static const guint kBufferSize  = 512;
   static const guint kPoolSize    = 0x400000;
   static const guint kPoolSizeMax = 0x600000;

    qsizetype nBuffersMin = kPoolSize / kBufferSize;
    qsizetype nBuffersMax = kPoolSizeMax / kBufferSize;

    gst_buffer_pool_config_set_params(pConfig, pCaps, kBufferSize, nBuffersMin, nBuffersMax);

   gst_buffer_pool_set_config(_pBufferPool, pConfig);
   gst_buffer_pool_set_active(_pBufferPool, TRUE);

   gst_caps_unref(GST_CAPS(pCaps));

When a new buffer is filled up by the USB driver, it’s pushed into the pipeline like this:

bool unref = false;

gst_buffer_unmap(b->pBuffer, &b->mapInfo);
gst_buffer_set_size(b->pBuffer, xfer.pXfer->actual_length);

if(result == LIBUSB_TRANSFER_COMPLETED)
{
    //-- DROP DATA IF NOT IN PLAYING STATE --
    GstState st, pend;
    GstStateChangeReturn scr = gst_element_get_state(GST_ELEMENT(_pAppSrc), &st, &pend, GST_CLOCK_TIME_NONE);
    Q_UNUSED(scr)
    bool drop = (st != GST_STATE_PLAYING);

    if(!drop)
    {
        GstFlowReturn ret = GST_FLOW_OK;

        // Push into pipeline
        ret = gst_app_src_push_buffer(GST_APP_SRC(_pAppSrc), b->pBuffer);

        if(ret != GST_FLOW_OK)
            qCDebug(MMCVideoLog()) << "Can't push buffer to the pipeline (" << ret << ")";
        else
            unref = false;  // Don't unref since gst_app_src_push_buffer() steals one reference and takes ownership
    }
} else if(result == LIBUSB_TRANSFER_CANCELLED)
{
    qCDebug(MMCVideoLog()) << "! Buffer canceled";
} else {
    qCDebug(MMCVideoLog()) << "? Buffer result = " << result;
}    

if(unref)
    gst_buffer_unref(b->pBuffer);

This is what I got from Android logcat from an affected machine:

[07-22 18:37:45.753 17414:18734 E/QGroundControl]
VideoReceiverLog: GStreamer error: [element ' "mp4mux0" ']  Could not multiplex stream.

[07-22 18:37:45.753 17414:18734 E/QGroundControl]
VideoReceiverLog: Details:  ../gst/isomp4/gstqtmux.c(5010): gst_qt_mux_add_buffer (): /GstPipeline:receiver/GstBin:sinkbin/GstMP4Mux:mp4mux0:
Buffer has no PTS.

What I’ve tried:

  • Setting GstBaseParser pts_interpolation to TRUE, and infer_ts to TRUE

So my questions are:

  • Can you see anything wrong with my code?, what am I missing?
  • Can I rely on matroskamux to avoid the issue temporarily until I find the true cause?

Edit: I managed to reproduce it "in situ" while printing out the PTS and DTS of every buffer using a probe attached to the sink pad of my tee element and found out that the problem buffer has no DTS and no PTS. Perhaps my h264parse or my capsfilter are doing something nasty inbetween my appsrc and my tee?

07-28 17:54:49.025  1932  2047 D : PTS:  295659241497 DTS:  295659241497
07-28 17:54:49.026  1932  2047 D : PTS:  295682488791 DTS:  295682488791
07-28 17:54:49.053  1932  2047 D : PTS:  295710463127 DTS:  295710463127
07-28 17:54:49.054  1932  2047 D : PTS:  18446744073709551615  DTS:  18446744073709551615
07-28 17:54:49.054  1932  2047 E : ************** NO PTS
07-28 17:54:49.054  1932  2047 E : ************** NO DTS
07-28 17:54:49.110  1932  2047 D : PTS:  295738607214 DTS:  295738607214
07-28 17:54:49.111  1932  2199 E : GStreamer error: [element ' "mp4mux1" ']  Could not multiplex stream.
07-28 17:54:49.111  1932  2199 E : Details:  ../gst/isomp4/gstqtmux.c(5010): gst_qt_mux_add_buffer (): /GstPipeline:receiver/GstBin:sinkbin/GstMP4Mux:mp4mux1:
07-28 17:54:49.111  1932  2199 E : Buffer has no PTS.

Edit 2: After much more digging, I found more clues: I wrote some code to dump every video packet that comes through USB along with a timestamp to a binary file, and a player to play it back. Then I went to the field and have the customer fly the drone until the bug was triggered. This way I have a way of reproducing the error at will.

By using two probes, one attached to the 'src' pad of my 'appsrc' element and one attached to the 'sink' pad of my 'tee' element, I printed the PTS and DTS of every packet going through them.

Here are my findings:

TL;DR: At some (random) point, even though h264parse is being fed timestamped buffers, it outputs a buffer with no PTS and no DTS.

  • The hardware h264 encoder that produces the stream I'm getting through USB inserts SPS NALS without VUI, and I get lots errors like this "[parser] unable to compute timestamp: VUI not present" when setting h264parse debug level to 6

  • The "VUI not present" error is very consistent and shows up very frequently. Most of the time goes unnoticed, therefore I'm not 100% sure this is the cause

  • Since the h264 buffers being pushed by appsrc have no alignment, and h264parse is outputting au-aligned buffers, there is no direct relationship between the number of 512-byte buffers out of appsrc and the number of buffers out of h264parse. Therefore, I believe there's fairly safe to say that there's no direct relationship between the timestamps generated by appsrc and those out of h264parse either. h264parse must be recomputing them.

  • My h264 stream is quite simple: SPS->PPS->I-frame->(a number of P-frames). There are no B-frames, and every frame is self-contained in one big fat slice (just 1 NAL).

So, now I'm trying to choose between these two options:

  1. Manually calculate and insert a fake SPS NAL with VUI every time the VUI-less SPS is detected. The stream should have variable framerate because of the radio signal strength (altough it's encoded at 60fps) and the resolution and picture properties are always the same.
  2. Parse the stream myself to provide timestamped au-aligned buffers to the pipeline.

Solution

  • After much head banging, I finally figured out the root cause of this. And it's a bit obscure..

    The wireless video transmitter in the drone is able to dynamically change the video bitrate depending on the radio link's available bandwidth. Or put in another way: When the drone is too far away or there is strong interference, the video quality degrades.

    When this happens, video frames (contained in only one slice within a single NAL) start to become significantly smaller. Since I'm reading 512-byte chunks of an h264 stream with no particular alignment and forwarding them to GStreamer as GstBuffers, if the size of data needed for one frame is lower than 512 bytes, there is a possibility that a buffer contains multiple frames. In this case, h264parse sees this as N different buffers with identical timestamps. The default behavior then seems to be to ignore both upstream PTS and DTS and try to compute the timestamp based on the duration of the frame by reading the VUI from the SPS, which is not present in my stream. Therefore, the buffer leaving the source pad of h264parse will have no PTS and no DTS, thus making mp4mux complain.

    As I've previously mentioned, my stream is quite simple so I wrote a simple parser to detect the beginning of each NAL. This way I can 'unpack' the stream coming from the USB hardware and make sure that every buffer pushed into my pipeline will contain only one NAL (therefore, as much as one frame), independently timestamped.

    And for redundancy, I added a probe attached to the sink pad of my tee element to make sure I have correct timestamps in every buffer through it. Otherwise, they're forced to the running time of the element like this.

    if (!GST_BUFFER_PTS_IS_VALID(buffer) || !GST_BUFFER_DTS_IS_VALID(buffer))
    {
    
        GstElement* elm = gst_pad_get_parent_element(pad);
        qCDebug(VideoReceiverLog) << "Invalid timestamp out of source. Replacing with element running time.";
        GstClockTime ts = gst_element_get_current_running_time(elm);
        GST_BUFFER_PTS(buffer) = ts;
        GST_BUFFER_DTS(buffer) = ts;
    }
    

    After doing this I can no longer reproduce the issue with my test dump.