How is a 24-bit audio stream delivered to the graph?

This is probably a very silly question, but after searching for a while, I couldn't find a straight answer.

If a source filter (such as the LAV Audio codec) is processing a 24-bit integral audio stream, how are individual audio samples delivered to the graph? (for simplicity lets consider a monophonic stream)

Are they stored individually on a 32-bit integer with the most-significant bits unused, or are they stored in a packed form, with the least significant bits of the next sample occupying the spare, most-significant bits of the current sample?

Solution

The format is similar to 16-bit PCM: the values are signed integers, little endian.

With 24-bit audio you normally define the format with the help of WAVEFORMATEXTENSIBLE structure, as opposed to WAVEFORMATEX (well, the latter is also possible in terms of being accepted by certain filters, but in general you are expected to use the former).

The structure has two values: number of bits per sample and number of valid bits per sample. So it's possible to have the 24-bit data represented as 24-bit values, and also as 24-bit meaningful bits of 32-bit values. The payload data should match the format.

There is no mix of bits of different samples within a byte:

However, wBitsPerSample is the container size and must be a multiple of 8, whereas wValidBitsPerSample can be any value not exceeding the container size. For example, if the format uses 20-bit samples, wBitsPerSample must be at least 24, but wValidBitsPerSample is 20.

To my best knowledge it's typical to have just 24-bit values, that is three bytes per PCM sample.

Non-PCM formats might define different packing and use "unused" bits more efficiently, so that, for example, to samples of 20-bit audio consume 5 bytes.