Search code examples
multithreadingmacoscore-audio

CoreAudio: multi-threaded back-end OS X


I'd like to learn how to deal with possibility of using multiple CPU cores in audio rendering of a single input parameter array in OSX.

In AudioToolbox, one rendering callback normally lives on a single thread which seemingly gets processed by a single CPU core.

How can one deal with input data overflow on that core, while other 3, 5 or 7 cores staying practically idle?

It is not possible to know in advance how many cores will be available on a particular machine, of course. Is there a way of (statically or dynamically) allocating rendering callbacks to different threads or "threadbare blocks"? Is there a way of precisely synchronising the moment at which various rendering callbacks on their own (highest priority) threads in parallel produce their audio buffers? Can there GCD API perhaps be of any use?

Thanks in advance!

PS. This question is related to another question I have posted a while ago: OSX AudioUnit SMP , with the difference that I now seem to better understand the scope of the problem.


Solution

  • No matter how you set up your audio processing on macOS – be it just writing a single render callback, or setting up a whole application suite – CoreAudio will always provide you with just one single realtime audio thread. This thread runs with the highest priority there is, and thus is the only way the system can give you at least some guarantees about processing time and such.

    If you really need to distribute load over multiple CPU cores, you need to create your own threads manually, and share sample and timing data across them. However, you will not be able to create a thread with the same priority as the system's audio thread, so your additional threads should be considered much "slower" than your audio thread, which means you might have to wait on your audio thread for some other thread(s) longer than you have time available, which then results in an audible glitch.

    Long story short, the most crucial part is to design the actual processing algorithm carefully, as in all scenarios you really need to know what task can take how long.


    EDIT: My previous answer here was quite different and uneducated. I updated the above parts for anybody coming across this answer in the future, to not be guided in the wrong direction.
    You can find the previous version in the history of this answer.