I'm using the Bento4 library to mux an Annex B TS (MPEG-2 transport stream) file with my h264 video and AAC audio streams that are being generated from VideoToolbox and AVFoundation respectively, as source data for a HLS (HTTP Live Streaming) stream. This question is not necessarily Bento4-specific: I'm trying to understand the underlying concepts so that I can accomplish the task, preferably by using Apple libs.
So far, I've figured out how to create an AP4_AvcSampleDescription
by getting various kinds of data out of my CMVideoFormatDescriptionRef
, and most importantly by generating an SPS and PPS using index 0 and 1 respectively of CMVideoFormatDescriptionGetH264ParameterSetAtIndex
that I can just stick as byte buffers into Bento4. Great, that's all the header information I need so that I can ask Bento4 to mux video into a ts file!
Now I'm trying to mux audio into the same file. I'm using my CMAudioFormatDescriptionRef
to get the required information to construct my AP4_MpegAudioSampleDescription
, which Bento4 uses to make the necessary QT atoms and headers. However, one if the fields is a "decoder info" byte buffer, with no explanation of what it is, or code to generate one from data. I would have hoped to have a CMAudioFormatDescriptionGetDecoderInfo
or something, but I can't find anything like that. Is there such a function in any Apple library? Or is there a nice spec that I haven't found on how to generate this data?
Or alternatively, am I walking down the wrong path? Is there an easier way to mux ts files from a Mac/iOS code base?
Muxing audio into an MPEG-TS is surprisingly easy, and does not require a complex header like a video stream does! It only requires a 7-byte ADTS header before each sample buffer, before you write it as a PES.
Bento4 only uses the "DecoderInfo" buffer in order to parse it into an AP4_Mp4AudioDecoderConfig
instance, so that it can extract the information needed for the ADTS header. Instead of being so roundabout in acquiring this data, I made a copy-paste of AP4_Mpeg2TsAudioSampleStream::WriteSample
that writes a CMSampleBufferRef
. It can easily be generalized for other audio frameworks, but I'll just paste it as-is here for reference:
// These two functions are copy-pasted from Ap4Mpeg2Ts.cpp
static unsigned int GetSamplingFrequencyIndex(unsigned int sampling_frequency) { ... }
static void
MakeAdtsHeader(unsigned char *bits,
size_t frame_size,
unsigned int sampling_frequency_index,
unsigned int channel_configuration) { ... }
static const size_t kAdtsHeaderLength = 7;
- (void)appendAudioSampleBuffer2:(CMSampleBufferRef)sampleBuffer
{
// Get the actual audio data from the block buffer.
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t blockBufferLength = CMBlockBufferGetDataLength(blockBuffer);
// Get the audio meta-data from its AudioFormatDescRef
CMAudioFormatDescriptionRef audioFormat = CMSampleBufferGetFormatDescription(sampleBuffer);
const AudioStreamBasicDescription *asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormat);
// These are the values we will need to build our ADTS header
unsigned int sample_rate = asbd->mSampleRate;
unsigned int channel_count = asbd->mChannelsPerFrame;
unsigned int sampling_frequency_index = GetSamplingFrequencyIndex(sample_rate);
unsigned int channel_configuration = channel_count;
// Create a byte buffer with first the header, and then the sample data.
NSMutableData *buffer = [NSMutableData dataWithLength:kAdtsHeaderLength + blockBufferLength];
MakeAdtsHeader((unsigned char*)[buffer mutableBytes], blockBufferLength, sampling_frequency_index, channel_configuration);
CMBlockBufferCopyDataBytes(blockBuffer, 0, blockBufferLength, ((char*)[buffer mutableBytes])+kAdtsHeaderLength);
// Calculate a timestamp int64 that Bento4 can use, by converting our CMTime into an Int64 in the timescale of the audio stream.
CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
AP4_UI64 ts = CMTimeConvertScale(presentationTime, _audioStream->m_TimeScale, kCMTimeRoundingMethod_Default).value;
_audioStream->WritePES(
(const unsigned char*)[buffer bytes],
(unsigned int)[buffer length],
ts,
false, // don't need a decode timestamp for audio
ts,
true, // do write a presentation timestamp so we can sync a/v
*_output
);
}