Mixing and Adding Silence to Audio Android/Java

I have 2 files. Once is an mp3 being decoded to pcm into a stream and I have a wav being read into pcm also. The samples are being held in a short data type.

Audio stats: 44,100 samples * 16 bits per sample * 2 channels = 1,411,200 bits/sec

I have X seconds of silence that I need to apply to the beginning of the mp3 pcm data and I am doing it like this:

private short[] mp3Buffer = null;
private short[] wavBuffer = null;
private short[] mixedBuffer = null;

double silenceSamples = (audioInfo.rate * padding) * 2;
for (int i = 0; i < minBufferSize; i++){

    if (silenceSamples > 0 ){

        mp3Buffer[i] = 0; //Add 0 to the buffer as silence

        mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);  
        silenceSamples = silenceSamples - 0.5;
    }
    else
        mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);
}

The audio is always off. Sometimes its a second or two too fast, sometimes its a second or two too slow too slow. I dont think its a problem with the timing as I start the audiorecord(wav) first and then set a start timer->start mediaplayer(already prepared)->end timer and setting the difference to the "padding" variable. I am also skipping the 44kb when from the wav header.

Any help would be much appreciated.

Solution

I'm assuming you are wanting to align two sources of audio in some way by inserting padding at the start of one of the streams? There are a few things wrong here.

mp3Buffer[i] = 0; //Add 0 to the buffer as silence

This is not adding silence to the beginning, is is just setting the entry at offest [i] in the array to 0. The next line:

mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);

Then just overwrites this value.

If you are wanting to align the streams in some way, the best way to go about it is not to insert silence at the beginning of either stream, but to just begin mixing in one of the streams at an offset from the other. Also it would be better to mix them into a 32 bit float and then normalise. Something like:

    int silenceSamples = (audioInfo.rate * padding) * 2;
            float[] mixedBuffer = new float[minBufferSize + silenceSamples]
    for (int i = 0; i < minBufferSize + silenceSamples; i++){

    if (i < silenceSamples )
    {       
        mixedBuffer[i] = (float) stereoWavBuffer[i];  
    }
    else if(i < minBufferSize)
    {
        mixedBuffer[i] = (float) (stereoWavBuffer[i] + mp3Buffer[i-silenceSamples]);
    }
    else 
    {
        mixedBuffer[i] = (float) (mp3Buffer[i-silenceSamples]);
    }

To normalise the data you need to run through the mixedBuffer and find the absolute largest value Math.abs(...), and then multiple all the values in the array by 32,767/largestValue - this will give you a buffer where the largest value fits back into a short without clipping. Then iterate through your float array moving each value back into a short array.

I'm not sure what your minBufferSize is - this will need to be large enough to get all your data mixed.