Search code examples
audiocore-audioaudiotoolbox

Basic math behind mixing audio channels


I have an app where I flick the touchscreen and unleash a dot which animates across the screen, reads the pixel color under is, and converts that to audio based on some parameters. This is working great for the most part.

Currently I'm creating one audio channel per dot (iPhone AudioComponent). This works good till I get up to about 15 dots then starts getting "choppy". Dropping audio in/out, etc...

I think if I were to mix the waveform of all of these channels together, then send that waveform out to maybe one or two channels, I could get much better performance for high numbers of dots. This is where I'm looking for advice.

I am assuming for any time t, I can take ((f1(x) + f2(x)) / 2.0). Is this a typical approach to mixing audio signals? This way I can never exceed (normalized) 1.0 .. -1.0, however I'm worried that I'll get the opposite of that; quiet audio. Maybe it won't matter so much if there are so many dots.

If someone can drop the name of any technique for this, I'll go read up on it. Or, any links would be great.


Solution

  • I know this is way too late to answer this but someone may be doing something similar and looking to these responses to help them.

    There are classically two answers to the challenge of getting the levels right when mixing (summing) multiple audio sources. This is because it's a vector problem and the answer is different depending on whether the sounds are coherent or not.

    If the two sources are coherent, then you would divide by the number of channels. In other words, for ten channels you sum them all and divide by 10 (attenuate by 20dB). For all ten channels to be coherent though, they all have to be carrying the same signal. Generally, that makes no sense - why would ten channels carry the same signal?

    There is one case though where coherence is common, where you are summing left and right from a stereo pair. In many cases these two separate signals are closer to coherent, closer to identical, than not.

    If the channels are not coherent, then the volume will increase not by the number of sources, but by the square root of the number of sources. For ten sources this means the sum would be 3.16 times as big as each of the sources (assuming that they are all the same level). This corresponds to an attenuation of 10dB. So, to sum 10 channels of different sounds (all of the same loudness) you should attenuate everything by 10dB.

    10dB = 20 x log(3.16) where 3.16 is the square root of 10.

    There's a practical part to this as well. We assumed that the channels are all equally loud, but what if they aren't? Quite often you have some channels that are similar and others that are quieter. Like say adding voices plus background music - where the music is quieter than the voices. As a rule of thumb, you can ignore the quieter channels. So, assume there are four voice channels and two quieter music channels. We start by ignoring the music channels which leaves four incoherent voice channels. The square root of four is two, so in this case we halve the audio level - attenuate it by 6dB.