audio benchmarking librosa pitch-shifting

Does torchlibrosa or librosa perform better for realtime audio processing?

I'm looking at doing some realtime audio processing (more specifically pitch shifting) - does Librosa or Torchlibrosa (https://github.com/qiuqiangkong/torchlibrosa) perform better at this in Python or what are some good benchmarks or algorithms to test this?

I know that Python isn't naturally suited to realtime applications, but I need to for this project. I am unsure of how to benchmark it quantitatively.

Solution

Benchmarking of real-time systems is done by measuring to which degree the system is able to meet the real-time deadline, including under adverse conditions. For a basic audio system, the main deadline is given by the audio I/O subsystem, which has a callback with a fixed size audio buffers - containing the input and output. The processing may also use internal buffers, potentially flexible buffering such as a queue/FIFO. If the processing is not able to fill the output buffer in time, it has failed to meet the deadline (buffer underrun).

So, benchmarking a real-time audio system means selecting a set of buffer sizes, running it and counting how many failures to meet deadline. It is good to run the tests for a set of buffer sizes / latency values, to determine where the problems get critical. If the system is not dedicated 100% to the audio processing, one must also test with other computational loads running at the same time, since this will be a major influence.

With this methodology, you will be able to answer which of the two libraries work best for your particular use-case and system.