Search code examples
iosaudioconverterscore-audiocircular-buffer

Core Audio: Float32 to SInt16 conversion artefacts


I am converting from the following format:

const int four_bytes_per_float = 4;
const int eight_bits_per_byte = 8;
_stereoGraphStreamFormat.mFormatID          = kAudioFormatLinearPCM;
_stereoGraphStreamFormat.mFormatFlags       = kAudioFormatFlagsNativeFloatPacked | kAudioFormatFlagIsNonInterleaved;
_stereoGraphStreamFormat.mBytesPerPacket    = four_bytes_per_float;
_stereoGraphStreamFormat.mFramesPerPacket   = 1;
_stereoGraphStreamFormat.mBytesPerFrame     = four_bytes_per_float;
_stereoGraphStreamFormat.mChannelsPerFrame  = 2;     
_stereoGraphStreamFormat.mBitsPerChannel    = eight_bits_per_byte * four_bytes_per_float;
_stereoGraphStreamFormat.mSampleRate        = 44100;

to the following format:

interleavedAudioDescription.mFormatID          = kAudioFormatLinearPCM;
interleavedAudioDescription.mFormatFlags       = kAudioFormatFlagIsSignedInteger;
interleavedAudioDescription.mChannelsPerFrame  = 2;
interleavedAudioDescription.mBytesPerPacket    = sizeof(SInt16)*interleavedAudioDescription.mChannelsPerFrame;
interleavedAudioDescription.mFramesPerPacket   = 1;
interleavedAudioDescription.mBytesPerFrame     = sizeof(SInt16)*interleavedAudioDescription.mChannelsPerFrame;
interleavedAudioDescription.mBitsPerChannel    = 8 * sizeof(SInt16);
interleavedAudioDescription.mSampleRate        = 44100;

Using the following code:

int32_t availableBytes = 0;

void* tailL = TPCircularBufferTail(inputBufferL(), &availableBytes);
void* tailR = TPCircularBufferTail(inputBufferR(), &availableBytes);

// If we have no data in the buffer, we simply return
if (availableBytes <= 0)
{
    return;
}

// ========== Non-Interleaved to Interleaved (Plus Samplerate Conversion) =========

// Get the number of frames available
UInt32 frames = availableBytes / this->mInputFormat.mBytesPerFrame;

pcmOutputBuffer->mBuffers[0].mDataByteSize = frames * interleavedAudioDescription.mBytesPerFrame;

struct complexInputDataProc_t data = (struct complexInputDataProc_t) { .self = this, .sourceL = tailL, .sourceR = tailR, .byteLength = availableBytes };

// Do the conversion
OSStatus result = AudioConverterFillComplexBuffer(interleavedAudioConverter,
                                                  complexInputDataProc,
                                                  &data,
                                                  &frames,
                                                  pcmOutputBuffer,
                                                  NULL);

// Tell the buffers how much data we consumed during the conversion so that it can be removed
TPCircularBufferConsume(inputBufferL(), availableBytes);
TPCircularBufferConsume(inputBufferR(), availableBytes);

// ========== Buffering Of Interleaved Samples =========

// If we got converted frames back from the converter, we want to add it to a separate buffer
if (frames > 0)
{
    // Make sure we have enough space in the buffer to store the new data
    TPCircularBufferHead(&pcmCircularBuffer, &availableBytes);

    if (availableBytes > pcmOutputBuffer->mBuffers[0].mDataByteSize)
    {
        // Add the newly converted data to the buffer
        TPCircularBufferProduceBytes(&pcmCircularBuffer, pcmOutputBuffer->mBuffers[0].mData, frames * interleavedAudioDescription.mBytesPerFrame);
    }
    else
    {
        printf("No Space in Buffer\n");
    }
}

However I am getting the following output:

Sine Wave After Conversion

It should be a perfect sine wave, however as you can see it is not.

I have been working on this for days now and just can’t seem to find where it is going wrong. Can anyone see something that I might be missing?

Edit:

Buffer initialisation:

TPCircularBuffer        pcmCircularBuffer;
static SInt16           pcmOutputBuf[BUFFER_SIZE];

pcmOutputBuffer = (AudioBufferList*)malloc(sizeof(AudioBufferList));
pcmOutputBuffer->mNumberBuffers = 1;
pcmOutputBuffer->mBuffers[0].mNumberChannels = 2;
pcmOutputBuffer->mBuffers[0].mData = pcmOutputBuf;

TPCircularBufferInit(&pcmCircularBuffer, BUFFER_SIZE);

Complex input data proc:

static OSStatus complexInputDataProc(AudioConverterRef             inAudioConverter,
                                 UInt32                        *ioNumberDataPackets,
                                 AudioBufferList               *ioData,
                                 AudioStreamPacketDescription  **outDataPacketDescription,
                                 void                          *inUserData) {

struct complexInputDataProc_t *arg = (struct complexInputDataProc_t*)inUserData;
BroadcastingServices::MP3Encoder *self = (BroadcastingServices::MP3Encoder*)arg->self;

if ( arg->byteLength <= 0 )
{
    *ioNumberDataPackets = 0;
    return 100; //kNoMoreDataErr;
}

UInt32 framesAvailable = arg->byteLength / self->interleavedAudioDescription.mBytesPerFrame;

if (*ioNumberDataPackets > framesAvailable)
{
    *ioNumberDataPackets = framesAvailable;
}

ioData->mBuffers[0].mData = arg->sourceL;
ioData->mBuffers[0].mDataByteSize = arg->byteLength;

ioData->mBuffers[1].mData = arg->sourceR;
ioData->mBuffers[1].mDataByteSize = arg->byteLength;

arg->byteLength = 0;

return noErr;

}


Solution

  • I see a few things that raise a red flag.

    1) as mentioned in a comment above, the fact that you are overwriting availableBytes for the left input with that from the right:

    void* tailL = TPCircularBufferTail(inputBufferL(), &availableBytes);
    void* tailR = TPCircularBufferTail(inputBufferR(), &availableBytes);
    

    If the two input streams are changing asynchronously to this code then most certainly you have a race condition.

    2) Truncation errors: availableBytes is not necessarily a multiple of bytes per frame. If not then the following bit of code could cause you to consume more bytes than you actually converted.

    void* tailL = TPCircularBufferTail(inputBufferL(), &availableBytes);
    void* tailR = TPCircularBufferTail(inputBufferR(), &availableBytes);
    ...
    UInt32 frames = availableBytes / this->mInputFormat.mBytesPerFrame;
    ...
    TPCircularBufferConsume(inputBufferL(), availableBytes);
    TPCircularBufferConsume(inputBufferR(), availableBytes);
    

    3) If the output buffer is not ready to consume all of the input you just throw the input buffer away. That happens in this code.

    if (availableBytes > pcmOutputBuffer->mBuffers[0].mDataByteSize)
    {
        ...
    }
    else
    {
        printf("No Space in Buffer\n");
    }
    

    I'd be really curious if your seeing the print output.

    Here's is how I would suggest doing it. It's going to be pseudo-codeish since I don't have anything necessary to compile and test it.

    int32_t availableBytesInL = 0;
    int32_t availableBytesInR = 0;
    int32_t availableBytesOut = 0;
    
    // figure out how many bytes are available in each buffer.
    void* tailL = TPCircularBufferTail(inputBufferL(), &availableBytesInL);
    void* tailR = TPCircularBufferTail(inputBufferR(), &availableBytesInR);
    TPCircularBufferHead(&pcmCircularBuffer, &availableBytesOut);
    
    // figure out how many full frames are available
    UInt32 framesInL = availableBytesInL / mInputFormat.mBytesPerFrame;
    UInt32 framesInR = availableBytesInR / mInputFormat.mBytesPerFrame;
    UInt32 framesOut = availableBytesOut / interleavedAudioDescription.mBytesPerFrame;
    
    // figure out how many frames to process this time.
    UInt32 frames = min(min(framesInL, framesInL), framesOut);
    
    if (frames == 0)
        return;
    
    int32_t bytesConsumed = frames * mInputFormat.mBytesPerFrame;
    
    struct complexInputDataProc_t data = (struct complexInputDataProc_t) {
        .self = this, .sourceL = tailL, .sourceR = tailR, .byteLength = bytesConsumed };
    
    // Do the conversion
    OSStatus result = AudioConverterFillComplexBuffer(interleavedAudioConverter,
                                                      complexInputDataProc,
                                                      &data,
                                                      &frames,
                                                      pcmOutputBuffer,
                                                      NULL);
    
    int32_t bytesProduced = frames * interleavedAudioDescription.mBytesPerFrame;
    
    // Tell the buffers how much data we consumed during the conversion so that it can be removed
    TPCircularBufferConsume(inputBufferL(), bytesConsumed);
    TPCircularBufferConsume(inputBufferR(), bytesConsumed);
    TPCircularBufferProduceBytes(&pcmCircularBuffer, pcmOutputBuffer->mBuffers[0].mData, bytesProduced);
    

    Basically what I've done here is to figure out up front how many frames should be processed making sure I'm only processing as many frames as the output buffer can handle. If it were me I'd also add some checking for buffer underruns on the output and buffer overruns on the input. Finally, I'm not exactly sure of the semantics of AudioConverterFillComplexBuffer wrt the frame parameter that is passing in and out. It's conceivable that the # frames out would be less or more than the number of frames in. Although, since your not doing sample rate conversion that's probably not going to happen. I've attempted to account for that condition by assigning bytesProduced after the conversion.

    Hope this helps. If not you have 2 other clues. One is that the drop outs are periodic and two is that the size of the drop outs looks to be about the same. If you can figure out how many samples each are then you can look for those numbers in your code.