Search code examples
gstreamerh.264flv

How do I properly unwrap FLV video into raw and valid h264 segments for gstreamer buffers?


I have written an RTMP server in rust that successfully allows RTMP publishers to connect, push a video stream, and RTMP clients can connect and watch those video streams successfully.

When a video RTMP packet comes in, I attempt to unwrap the video from the FLV container via:

    // TODO: The FLV spec has the AVCPacketType and composition time as the first parts of the
    // AVCPACKETTYPE.  It's unclear if these two fields are part of h264 or FLV specific. 
    let flv_tag = data.split_to(1);
    let is_sequence_header;
    let codec = if flv_tag[0] & 0x07 == 0x07 {
        is_sequence_header = data[0] == 0x00;
        VideoCodec::H264
    } else {
        is_sequence_header = false;
        VideoCodec::Unknown
    };

    let is_keyframe = flv_tag[0] & 0x10 == 0x10;

After this runs data contains the AVCVIDEOPACKET with the flv tag removed. When I send this video to other RTMP clients i just prepend the correct flv tag to it and send it off.

Now I am trying to pass the video packets to gstreamer in order to do in process transcoding. To do this I set up an appsrc | avdec_264 pipeline, and gave the appsrc component the following caps:

        video_source.set_caps(Some(
            &Caps::builder("video/x-h264")
                .field("alignment", "nal")
                .field("stream-format", "byte-stream")
                .build()
        ));

Now when an RTMP publisher sends a video packet, I take the (attempted) unwrapped video packet and pass it to my appsrc via

    pub fn push_video(&self, data: Bytes, timestamp: RtmpTimestamp) {
        let mut buffer = Buffer::with_size(data.len()).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();
            buffer.set_pts(ClockTime::MSECOND * timestamp.value as u64);

            let mut samples = buffer.map_writable().unwrap();
            {
                let samples = samples.as_mut_slice();
                for index in 0..data.len() {
                    samples[index] = data[index];
                }
            }
        }

        self.video_source.push_buffer(buffer).unwrap();
    }

When this occurs the following gstreamer debug output appears

2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #0 (is_sequence_header:true, is_keyframe=true)
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Connection 63397d56-16fb-4b54-a622-d991b5ad2d8e sent audio data
0:00:05.531722000  7516 000001C0C04011C0 INFO               GST_EVENT gstevent.c:973:gst_event_new_segment: creating segment event bytes segment start=0, offset=0, stop=-1, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0, base=0, position 0, duration -1
0:00:05.533525000  7516 000001C0C04011C0 INFO                 basesrc gstbasesrc.c:3018:gst_base_src_loop:<video_source> marking pending DISCONT
0:00:05.535385000  7516 000001C0C04011C0 WARN            videodecoder gstvideodecoder.c:2818:gst_video_decoder_chain:<video_decode> Received buffer without a new-segment. Assuming timestamps start from 0.
0:00:05.537381000  7516 000001C0C04011C0 INFO               GST_EVENT gstevent.c:973:gst_event_new_segment: creating segment event time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #1 (is_sequence_header:false, is_keyframe=true)
0:00:05.563445000  7516 000001C0C04011C0 INFO                   libav :0:: Invalid NAL unit 0, skipping.
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #2 (is_sequence_header:false, is_keyframe=false)
0:00:05.579274000  7516 000001C0C04011C0 ERROR                  libav :0:: No start code is found.
0:00:05.581338000  7516 000001C0C04011C0 ERROR                  libav :0:: Error splitting the input into NAL units.
0:00:05.583337000  7516 000001C0C04011C0 WARN                   libav gstavviddec.c:2068:gst_ffmpegviddec_handle_frame:<video_decode> Failed to send data for decoding
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #3 (is_sequence_header:false, is_keyframe=false)
0:00:05.595253000  7516 000001C0C04011C0 ERROR                  libav :0:: No start code is found.
0:00:05.597204000  7516 000001C0C04011C0 ERROR                  libav :0:: Error splitting the input into NAL units.
0:00:05.599262000  7516 000001C0C04011C0 WARN                   libav gstavviddec.c:2068:gst_ffmpegviddec_handle_frame:<video_decode> Failed to send data for decoding

Based on this I figured this might be caused by the non-data portions of the AVCVIDEOPACKET not being part of the h264 flow, but an FLV specific flow. So I tried ignoring the first 4 bytes (AVCPacketType and CompositionTime fields) of each packet I wrote to the buffer:

    pub fn push_video(&self, data: Bytes, timestamp: RtmpTimestamp) {
        let mut buffer = Buffer::with_size(data.len() - 4).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();
            buffer.set_pts(ClockTime::MSECOND * timestamp.value as u64);

            let mut samples = buffer.map_writable().unwrap();
            {
                let samples = samples.as_mut_slice();
                for index in 4..data.len() {
                    samples[index - 4] = data[index];
                }
            }
        }

        self.video_source.push_buffer(buffer).unwrap();
    }

This essentially gave me the same logging output and errors. This is reproducible with the h264parse plugin as well.

What am I missing in the unwrapping process to pass raw h264 video to gstreamer?

Edit:

Realizing I misread the pad template I tried the following caps instead

        video_source.set_caps(Some(
            &Caps::builder("video/x-h264")
                .field("alignment", "au")
                .field("stream-format", "avc")
                .build()
        ));

This also failed with pretty simmilar output.


Solution

  • I think I finally figured this out.

    The first thing is that I need to include removing the AVCVIDEOPACKET headers (packet type and composition time fields). These are not part of the h264 format and thus cause parsing errors.

    The second thing I needed to do was to not pass the sequence header as a buffer to the source. Instead the sequence header bytes need to be set as the codec_data field for the appsrc's caps. This now allows for no parsing errors when passing the video data to h264parse, and even gives me a correctly sized window.

    The third thing I was missing is the correct dts and pts values. It turns out the RTMP timestamp I'm given is the dts, and pts = AVCVIDEOPACKET.CompositionTime + dts.