Search code examples
videohttp-live-streamingaacmediastreamsegmenter

What does the Apple segmenter "optimize" option actually do to AAC audio?


I'm writing a Flash video player to play Apple HLS video streams and I'm finding that any content with an AAC audio track which has been segmented by the Apple tools with the -optimize option enabled (Now the default) has an audio track which I can not decode (The sync bytes aren't where I think they should be).

What does the optimize option do to the audio? Is it re-encoding it or just packing it differently?

Most importantly, what do I need to do in order to read the audio track correctly?

I've been searching for some months now, but no one seems to have a technically useful answer to this (i.e. anything beyond "It makes the files smaller").

This appears to only affect the audio track, if I disable audio decoding, the video plays back just fine in all cases, at least all I've seen so far - Apple tools, ffmpeg, commercial encoders etc.


Solution

  • Ok, so after some experimenting I think I've found the answer to my question.

    Normally AAC frames are packed such that a (small) whole number of AAC frames fit within a single Payload Unit, roughly interleaved in PTS order with the video frames they synchronize with. These Payload Units are then packed into the payload space of consecutive 188 byte TS Packets, with the empty space in the last TS Packet padded with junk (i.e. not part of the data stream). This can mean that in a 10 second TS segment you might have an overhead in the order of roughly 2-6 Kbytes.

    With AAC optimization, two things appear to change.

    1. The size of the Payload Units containing the AAC frames is increased, reducing overall the number of Payload Units.
    2. The Payload Units are sized to an exact multiple of the payload space in a TS Packet rather than to fit an number of whole AAC frames.

    This means that padding is all but eliminated - All used space is valuable data so the overall size is reduced.

    In addition, this means that AAC frames are no longer immediately adjacent to the video frames they should sync with - They may be some considerable distance apart in fact.

    This also means, however, that an individual Payload Unit may not be able to fit a whole AAC frame at the end, so as much of the frame as will fit is put in this Payload Unit, and the rest is put at the beginning of the next AAC Payload Unit - This means that AAC Payload Units may not start with AAC frame sync bytes! (Which is what I was seeing!)

    Wherever either the read AAC frame is not long enough to contain a whole header, or where the AAC payload length in the AAC header is less than remaining Payload Unit length, then the remaining data must be in the next AAC Payload Unit - Note however, this may be in the next segment!