Search code examples
audiosamplingmixingaudacity

How does Audacity mix audio samples?


So let's say I want to mix these 2 audio tracks:

Unmixed

In Audacity, I can use the "Mix and Render" option to mix them together, and I'll get this:

Audacity Mix

However, when I try to write my own code to mix, I get this:

My Mix

This is essentially how I mix the samples:

private function mixSamples(sample1:UInt, sample2:UInt):UInt
{
    return (sample1 + sample2) & 0xFF;
}

(The syntax is Haxe but it should be easy to follow if you don't know it.)

These are 8-bit sample audio files, and I want the product to be 8-bit as well, hence the & 0xFF.

I do understand that by simply adding the samples, I should expect clipping. My issue is that mixing in Audacity doesn't cause clipping (at least not to the extent that my code does), and by looking at the "tail" of the second (longer) track, it doesn't seem to reduce the amplitude. It doesn't sound any softer either.

So basically, my question is this: what's Audacity doing that I'm not? I want to mix tracks to sound exactly as if they're being played on top of one another, but I (obviously) don't want this horrendous clipping.

EDIT:

Here is what I get if I sign the values before I add, then unsign the sum value, as suggested by Radiodef:

My Signed Mix

As you can see it's much better than before, but is still quite distorted and noisy compared to the result Audacity produces. So my problem still stands, Audacity must be doing something differently.

EDIT2:

I mixed the first track on itself, both with my code and Audacity, and compared the points where distortion occurs. This is Audacity's result:

Zoom Audacity

And this is my result:

enter image description here


Solution

  • I think what is happening is you are summing them as unsigned. A typical sound wave is both positive and negative which is why they add together the way they do (some parts cancel). If you have some 8-bit sample that is -96 and another that is 96 and you sum them you will get 0. If what you have is unsigned audio you will instead have the samples 32 and 224 summed = 256 (offset and overflow).

    What you need to do is sign them before summing. To sign 8-bit samples convert them to a signed int type and subtract 128 from all of them. I assume what you have are WAV files and you will need to unsign them again after the sum.

    Audacity probably does floating point processing. I've heard some real dubious claims about floating point like that it has "infinite dynamic range" and garbage like that but it doesn't clip in the same determinate and obvious way as integers do. Floating point has a finite range of values same as integers but the largest and smallest values are much farther apart. (That's about the simplest way to put it.) Floating point can allow much greater amplitude changes in the audio but the catch is the overall signal to noise ratio is lower than integers.

    With the weird distortion my best guess is it is from the mask you are doing with & 0xFF. If you want to actually clip instead of getting overflow you will need to do so yourself.

    for (int i = 0; i < samplesLength; i++) {
        if (samples[i] > 127) {
            samples[i] = 127;
        } else if (samples[i] < -128) {
            samples[i] = -128;
        }
    }
    

    Otherwise say you have two samples that are 125, summing gets you 250 (11111010). Then you unsign (add 128) and get 378 (101111010). An & will get you 1111010 which is 122. Other numbers might get you results that are effectively negative or close to 0.

    If you want to clip at something other than 8-bit, full scale for a bit depth n will be positive (2 ^ (n - 1)) - 1 and negative 2 ^ (n - 1) so for example 32767 and -32768 for 16-bit.

    Another thing you can do instead of clipping is to search for clipping and normalize. Something like:

    double[] normalize(double[] samples, int length, int destBits) {
    
        double fsNeg = -pow(2, destBits - 1);
        double fsPos = -fsNeg - 1;
    
        double peak = 0;
        double norm = 1;
    
        for (int i = 0; i < length; i++) {
            // find highest clip if there is one
    
            if (samples[i] < fsNeg || samples[i] > fsPos) {
                norm = abs(samples[i]);
    
                if (norm > peak) {
                    norm = peak;
                }
            }
        }
    
        if (peak != 0) {
    
            // ratio to reduce to where there is not a clip
            norm = -fsNeg / peak;
    
            for (int i = 0; i < length; i++) {
                samples[i] *= norm;
            }
        }
    
        return samples;
    }