Search code examples
audioms-media-foundation

Enumerating AAC audio formats yields incomplete WAVEFORMATEX results


[Note that I use Windows API WAVEFORMATEX structs for legacy reasons. If you don't use WAVEFORMATEX with the Media Foundation lib, you won't have the problem described in this post.]

In order to enumerate available MP3 formats for the builtin codec, I used to call MFTranscodeGetAudioOutputAvailableTypes() and then for each type IMFMediaType::GetRepresentation(AM_MEDIA_TYPE_REPRESENTATION, AM_MEDIA_TYPE ** type_rep). As expected, the type_rep contained the MPEGLAYER3WAVEFORMAT extension of WAVEFORMATEX with the proper size.

void add_mp3_format(std::vector<audio::WaveFormat> & formats, IMFMediaType * const output_type) {
    AM_MEDIA_TYPE * type_rep = nullptr;
    HRESULT hr = output_type->GetRepresentation(AM_MEDIA_TYPE_REPRESENTATION, (void **)&type_rep);
    if (SUCCEEDED(hr)) {
        if (type_rep && type_rep->formattype == FORMAT_WaveFormatEx && type_rep->pbFormat && type_rep->cbFormat >= sizeof(WAVEFORMATEX)) {
            WAVEFORMATEX const * const wfx = (WAVEFORMATEX *)type_rep->pbFormat;
            if (wfx->wFormatTag == WAVE_FORMAT_MPEGLAYER3 && wfx->cbSize == MPEGLAYER3_WFX_EXTRA_BYTES) {
                MPEGLAYER3WAVEFORMAT const * const mp3 = (LPMPEGLAYER3WAVEFORMAT)wfx;
                formats.emplace_back(*mp3, L"MP3");
            //} else if (wfx->wFormatTag == WAVE_FORMAT_MPEG_HEAAC && wfx->cbSize >= sizeof(HEAACWAVEINFO) - sizeof(WAVEFORMATEX)) {
            //  HEAACWAVEINFO const * const aac = (LPHEAACWAVEINFO)wfx;
            //  formats.emplace_back(*aac, L"AAC");
            }
        }
        output_type->FreeRepresentation(AM_MEDIA_TYPE_REPRESENTATION, (void *)type_rep);
    }
}

Later, I used these format structs to set the output types for the encoder, using MFInitMediaTypeFromWaveFormatEx() and IMFSinkWriter::AddStream(). This worked fine.

Now I wanted to add AAC formats available with the builtin codec, using the same functions. Unfortunately, that failed because of a bug either in the Media Foundation lib or in the codec itself.

Similar to MPEGLAYER3WAVEFORMAT above, I expected HEAACWAVEINFO or even HEAACWAVEFORMAT structs. But IMFMediaType::GetRepresentation() omits all the special AAC info and returns only the plain old WAVEFORMATEX part. Its cbSize member should be at least 12, but it is 0. Because of that, MFInitMediaTypeFromWaveFormatEx() fails and encoding is not possible. Hoping for the best and stretching the returned structs into HEAACWAVEINFO doesn't help, the required info is indeed missing.

How to get the missing AAC format info and make the encoder work?


Solution

  • Instead of relying on IMFMediaType::GetRepresentation(AM_MEDIA_TYPE_REPRESENTATION, AM_MEDIA_TYPE ** type_rep), you have to use the IMFAttributes interface of the enumerated IMFMediaType to extract the necessary info manually and then fill a HEAACWAVEINFO with it.

    The MF_MT_AAC_* attributes are the missing parts in GetRepresentation(). There should also be a blob field MF_MT_USER_DATA, representing HEAACWAVEFORMAT::pbAudioSpecificConfig or even the whole extra info in HEAACWAVEINFO, but it's not there. Perhaps the blob field MF_MT_MPEG4_SAMPLE_DESCRIPTION should represent HEAACWAVEFORMAT::pbAudioSpecificConfig, but that's unclear. But encoding works even without that part. If you require, there's more info about that last part here.

    void add_aac_format(std::vector<audio::WaveFormat> & formats, IMFMediaType * const output_type) {
        UINT32 pt = 0, pli = 0, nc = 0, sps = 0, apbs = 0, ba = 0, bps = 0;
        if (SUCCEEDED(output_type->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &nc)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, &sps)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, &apbs)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, &ba)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, &bps)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AAC_PAYLOAD_TYPE, &pt)) &&
            SUCCEEDED(output_type->GetUINT32(MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION, &pli)))
        {
            HEAACWAVEINFO aac_wi {0};
            aac_wi.wfx.wFormatTag = WAVE_FORMAT_MPEG_HEAAC;
            aac_wi.wfx.nChannels = nc;
            aac_wi.wfx.nSamplesPerSec = sps;
            aac_wi.wfx.nAvgBytesPerSec = apbs;
            aac_wi.wfx.nBlockAlign = ba;
            aac_wi.wfx.wBitsPerSample = bps;
            aac_wi.wfx.cbSize = sizeof(HEAACWAVEINFO) - sizeof(WAVEFORMATEX);
            aac_wi.wPayloadType = pt;
            aac_wi.wAudioProfileLevelIndication = pli;
            formats.emplace_back(aac_wi, L"AAC");
        }
    }