Search code examples
c++winapims-media-foundation

Why is my sound recording with WinAPI C++ not played back properly in audacity?


I'm trying to record sound from the microphone but it is getting difficult. I have tried several ways and it doesn't work. I created a project only for testing which is going to be implemented in a bigger project later. Here is the code of the project in question:

#include <iostream>
#include <fstream>
#include <Windows.h>
#include <dshow.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <ks.h>
#include <ksmedia.h>

#pragma comment(lib, "mfplat")
#pragma comment(lib, "mf")
#pragma comment(lib, "mfreadwrite")
#pragma comment(lib, "mfuuid")
#pragma comment(lib, "strmbase")

int main() {
    HRESULT hr = MFStartup(MF_VERSION);

    IMFMediaSource* pSoundSource = NULL;
    IMFAttributes* pSoundConfig = NULL;
    IMFActivate** ppSoundDevices = NULL;

    hr = MFCreateAttributes(&pSoundConfig, 1);
    if (FAILED(hr)) {
        std::cout << "Failed to create attribute store";
    }

    hr = pSoundConfig->SetGUID(MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE, MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_AUDCAP_GUID);


    UINT32 count;
    hr = MFEnumDeviceSources(pSoundConfig, &ppSoundDevices, &count);
    if (FAILED(hr)) {
        std::cout << "Failed to enumerate capture devices";
    }

    hr = ppSoundDevices[0]->ActivateObject(IID_PPV_ARGS(&pSoundSource));
    if (FAILED(hr)) {
        std::cout << "Failed to connect microphone to source";
    }

    IMFSourceReader* pSoundReader;
    hr = MFCreateSourceReaderFromMediaSource(pSoundSource, pSoundConfig, &pSoundReader);
    if (FAILED(hr)) {
        std::cout << "Failed to create source reader";
    }

    /*This part is for getting the audio format that the microphone outputs*/
    /*______________________*/
    IMFMediaType* pSoundType = NULL;
    DWORD dwMediaTypeIndex = 0;
    DWORD dwStreamIndex = 0;
    hr = pSoundReader->GetNativeMediaType(dwStreamIndex, dwMediaTypeIndex, &pSoundType);
    LPVOID soundRepresentation;
    pSoundType->GetRepresentation(AM_MEDIA_TYPE_REPRESENTATION, &soundRepresentation);
    GUID subSoundType = ((AM_MEDIA_TYPE*)soundRepresentation)->subtype;
    BYTE* pbSoundFormat = ((AM_MEDIA_TYPE*)soundRepresentation)->pbFormat;
    GUID soundFormatType = ((AM_MEDIA_TYPE*)soundRepresentation)->formattype;
    if (soundFormatType == FORMAT_WaveFormatEx) { std::cout << 8; }
    WAVEFORMATEXTENSIBLE* soundFormat = (WAVEFORMATEXTENSIBLE*)pbSoundFormat;
    std::cout << std::endl;
    std::cout << soundFormat->Format.wFormatTag << std::endl;
    std::cout << soundFormat->Format.nChannels << std::endl;
    std::cout << soundFormat->Format.nBlockAlign << std::endl;
    std::cout << soundFormat->Format.nSamplesPerSec << std::endl;
    std::cout << soundFormat->Format.wBitsPerSample << std::endl;
    std::cout << soundFormat->Format.cbSize << std::endl;
    if (soundFormat->SubFormat == KSDATAFORMAT_SUBTYPE_IEEE_FLOAT)
        std::cout << "IEEE-FLOAT!" << std::endl;
    /*_____________________*/

    DWORD streamIndex, flags;
    LONGLONG llTimeStamp;
    IMFSample* pSoundSample;
    while (true) {
        hr = pSoundReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, &streamIndex, &flags, &llTimeStamp, &pSoundSample);
        if (FAILED(hr)) {
            std::cout << "Failed to get sound from microphone";
        }

        if (pSoundSample != NULL) {
            IMFMediaBuffer* pSoundBuffer;
            pSoundSample->ConvertToContiguousBuffer(&pSoundBuffer);
            DWORD soundlength;
            pSoundBuffer->GetCurrentLength(&soundlength);
            unsigned char* sounddata;
            hr = pSoundBuffer->Lock(&sounddata, NULL, &soundlength);
            if (FAILED(hr)) {
                std::cout << "Failed to get sounddata from buffer";
            }

            std::ofstream file;
            file.open("C:\\Users\\user\\Documents\\test.raw", std::ios::app);
            for (unsigned int i = 0; i < soundlength; i++)
                file << sounddata[i];
            file.close();
        }
    }
}

The part of the code which is supposed to determine the format of the data prints on the console:

8
65534
1
4
48000
32
22
IEEE-FLOAT!

From this, I determined that the sound is recorded in 1 channel 32bits 48000Hz IEEE-FLOAT format. Now I need to playback this sound. The problem is most APIs will take 16bit PCM for sound playback.

I tried converting the sound to 16bits PCM but it doesn't work well. If you are aware of how to do that, can you show some code? Also, in the code presented here, I'm appending the sound to a raw audio file without a header. I heard that the float representation is between 1 and -1 so I tried the following code to do the conversion:

void iefloat_to_pcm16(unsigned char* sounddata, std::vector<unsigned char>& newdata, int soundlength) {
    for (int i = 0; i < soundlength && i + 3 < soundlength; i += 4) {
        float f;
        unsigned char b[] = { sounddata[i], sounddata[i + 1], sounddata[i + 2], sounddata[i + 3] };
        memcpy(&f, &b, sizeof(f));
        short pcm16 = f * 32767 + 0.5;
        newdata.push_back((unsigned char)(pcm16 >> 8));
        newdata.push_back((unsigned char)pcm16);
    }
}

This code doesn't seem to work.

After this, I've been using Audacity with File > Import > Raw Data which allows to import raw data and specify the format the data is in. So I selected 1 channel 32 bits float, 48kHZ and I tried all endianness to no avail. I did the same with the data "converted" to 16 bits PCM. The result is just random noise in audacity. I can see that there are spikes where I make noise and the rest is silent. But the spikes are just noise. Is there something I'm doing wrong here?


Solution

  • Audio files are binary format, but you're placing text in the file.

    file << sounddata[i];
    

    That's a formatted insertion operator, which converts the data to a textual representation. Instead use file.write().

    You may also need to mess with the flags you use to open the stream. C++ Standard I/O streams are not made for binary data. Since you are extensively using Windows API objects already, you might just switch to CreateFile / WriteFile where no conversion facets are active beneath the surface.