Search code examples
iosaudioavfoundationmpeghttp-live-streaming

Audio equivalent of SPS and PPS when muxing Annex B MPEG-TS? What is "DecoderInfo"?


I'm using the Bento4 library to mux an Annex B TS (MPEG-2 transport stream) file with my h264 video and AAC audio streams that are being generated from VideoToolbox and AVFoundation respectively, as source data for a HLS (HTTP Live Streaming) stream. This question is not necessarily Bento4-specific: I'm trying to understand the underlying concepts so that I can accomplish the task, preferably by using Apple libs.

So far, I've figured out how to create an AP4_AvcSampleDescription by getting various kinds of data out of my CMVideoFormatDescriptionRef, and most importantly by generating an SPS and PPS using index 0 and 1 respectively of CMVideoFormatDescriptionGetH264ParameterSetAtIndex that I can just stick as byte buffers into Bento4. Great, that's all the header information I need so that I can ask Bento4 to mux video into a ts file!

Now I'm trying to mux audio into the same file. I'm using my CMAudioFormatDescriptionRef to get the required information to construct my AP4_MpegAudioSampleDescription, which Bento4 uses to make the necessary QT atoms and headers. However, one if the fields is a "decoder info" byte buffer, with no explanation of what it is, or code to generate one from data. I would have hoped to have a CMAudioFormatDescriptionGetDecoderInfo or something, but I can't find anything like that. Is there such a function in any Apple library? Or is there a nice spec that I haven't found on how to generate this data?

Or alternatively, am I walking down the wrong path? Is there an easier way to mux ts files from a Mac/iOS code base?


Solution

  • Muxing audio into an MPEG-TS is surprisingly easy, and does not require a complex header like a video stream does! It only requires a 7-byte ADTS header before each sample buffer, before you write it as a PES.

    Bento4 only uses the "DecoderInfo" buffer in order to parse it into an AP4_Mp4AudioDecoderConfig instance, so that it can extract the information needed for the ADTS header. Instead of being so roundabout in acquiring this data, I made a copy-paste of AP4_Mpeg2TsAudioSampleStream::WriteSample that writes a CMSampleBufferRef. It can easily be generalized for other audio frameworks, but I'll just paste it as-is here for reference:

    // These two functions are copy-pasted from Ap4Mpeg2Ts.cpp
    static unsigned int GetSamplingFrequencyIndex(unsigned int sampling_frequency) { ... }
    static void
    MakeAdtsHeader(unsigned char *bits,
                   size_t  frame_size,
                   unsigned int  sampling_frequency_index,
                   unsigned int  channel_configuration) { ... }
    
    static const size_t kAdtsHeaderLength = 7;
    
    - (void)appendAudioSampleBuffer2:(CMSampleBufferRef)sampleBuffer
    {
        // Get the actual audio data from the block buffer.
        CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
        size_t blockBufferLength = CMBlockBufferGetDataLength(blockBuffer);
    
        // Get the audio meta-data from its AudioFormatDescRef
        CMAudioFormatDescriptionRef audioFormat = CMSampleBufferGetFormatDescription(sampleBuffer);
        const AudioStreamBasicDescription *asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormat);
    
        // These are the values we will need to build our ADTS header
        unsigned int sample_rate = asbd->mSampleRate;
        unsigned int channel_count = asbd->mChannelsPerFrame;
        unsigned int sampling_frequency_index = GetSamplingFrequencyIndex(sample_rate);
        unsigned int channel_configuration = channel_count;
    
        // Create a byte buffer with first the header, and then the sample data.
        NSMutableData *buffer = [NSMutableData dataWithLength:kAdtsHeaderLength + blockBufferLength];
        MakeAdtsHeader((unsigned char*)[buffer mutableBytes], blockBufferLength, sampling_frequency_index, channel_configuration);
        CMBlockBufferCopyDataBytes(blockBuffer, 0, blockBufferLength, ((char*)[buffer mutableBytes])+kAdtsHeaderLength);
    
        // Calculate a timestamp int64 that Bento4 can use, by converting our CMTime into an Int64 in the timescale of the audio stream.
        CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
        AP4_UI64 ts = CMTimeConvertScale(presentationTime, _audioStream->m_TimeScale, kCMTimeRoundingMethod_Default).value;
    
        _audioStream->WritePES(
            (const unsigned char*)[buffer bytes],
            (unsigned int)[buffer length],
            ts,
            false, // don't need a decode timestamp for audio
            ts,
            true, // do write a presentation timestamp so we can sync a/v
            *_output
        );
    }