Search code examples
iosswiftaudiosignal-processingaudiokit

AudioKit/DSP: Understanding the different between the size of uncompressed audio on disk vs. in memory


This is a more generic RTFM DSP question coming from someone very comfortable with audio production and software , but new to audio software development, regarding the difference in size between uncompressed (wav, caf, aiff) files (44.1 sample rate, 16 bit) on disk, vs. the actual float values of this audio in memory.

For example, I have a test WAV file that according to MacOS is seven minutes and fourteen seconds (7:14) and is 83.4 MB in size.

If I import this file to my project and open the file as an AKAudioFile, then inspect the .floatChannelData property (which is an array of two arrays, one for each channel (two standard for a stereo file)), this file in particular is a total of roughly 23 million floats, around 180 megabytes on the heap. This makes sense as the standard Float object in Swift is a 32bit float at 8 bytes per float.

I understand the size, however I am hoping to be able to work with something closer to 16bit as at least in my applications, I am simply analyzing this audio, not processing it any way, and even after some basic optimizations and preventing deep copies, any audio exceeding 10 or so minutes goes into using gigs of memory on the heap.

According to this SO question there are some novel ways to convert 32bit to 16, but honestly this feels like the wrong/overkill approach for what I want to do. As a point of example, if I simply just reference floatChannelData from my AKAudioFile it automatically adds around 300 megs to the heap! Even without copying, appending, etc....

For the more experienced DSP audio developers out there, are there any resources for good heap/stack management for large floating point numbers in your program? Can AudioKit record things to 16bit? I am currently doing processing in C and C++ so I feel very comfortable doing any sort of math or conversions there if it's more performative. Any leads are so appreciative, thank you!


Solution

  • AudioKit uses various 3rd party DSP routines that require data in 32-bit float array format. Swift makes copies of Swift arrays when these arrays are referenced in certain ways, or passed as parameters in certain ways. So you are likely stuck with large memory use if you use basic Swift coding techniques with common AudioKit APIs.

    An alternative is to not use the AudioKit API with standard Swift arrays, and convert the data to 32-bit only as needed.

    For instance, you can memory map (mmap) your WAVE file, which allows iOS to page the 16-bit data into the VM system as needed, instead of all at once into a 32-bit AudioKit format. Then use vDSP to convert only the 16-bit WAVE data slices as needed from the mapped file into smaller pre-allocated C float arrays, the minimum needed by the calls to the DSP routines (perhaps the same C code that as AudioKit uses internally). Swift usually won't make copies of these pre-allocated C arrays when passing (mutable unsafe raw) pointers to C routines.

    These techniques can allow your app's memory footprint to be far smaller, use less CPU cycles, and help prevent your app from running down the iOS device's battery as fast).