PyAudio : What is the most efficient format and pack/unpack method to use in callback mode?

I am using pyaudio in callback mode with paFloat32 format, 2 channels, 1024 frames per buffer, and I am interested in a more efficient input/output audio buffer data exchange.

To unpack an input audio buffer and get the list of float samples, I use:

fmt       = str( N_CHANNELS * BUFFER_SIZE ) + 'f'
in_floats = struct.unpack( fmt, in_data )

Using struct.pack() and struct.unpack() is quite inefficient and it takes significant CPU resources, almost the same as the audio signal processing itself. Since most sound cards are 16 bit, I also tried to use the paInt16 format, but the results are almost identical.

What would be the most efficient format and pack/unpack method to use in callback mode (of course maintaining full resolution)?

Edit: PyAudio exchanges data using binary streams or buffers similar to the C data structures used with Portaudio. I need to unpack the in_data input buffer to get the float samples and analyze them. Everyting is OK, except the unpack is a bit slow.

Solution

Using either NumPy or the stdlib's array module is going to be much faster, because most of the cost of struct.unpack isn't the unpacking, it's the boxing up of each float value in a Python float object.

For example:

In [1177]: f = [random.random() for _ in range(65536)]

In [1178]: b = struct.pack('65536f', *f)

In [1179]: %timeit struct.unpack('65536f', b)
1000 loops, best of 3: 1.61 ms per loop

In [1180]: %timeit array.array('f', b)
100000 loops, best of 3: 17.7 µs per loop

That's 100x as fast. And you've got an iterable of floats either way, it's just that it's an array instead of a tuple.

However, if you're planning to actually do any arithmetic on these values, you're still going to have to iterate those values—and array will have to unbox each of them as you do so, which is going to add back in a good chunk of the time you've saved.

That's where NumPy comes in; I doubt np.frombuffer(b, dtype=np.float32) is going to be hugely faster than array.array('f', b) to create, but it will allow you to do vectorized arithmetic directly on the unboxed values. For example:

In [1186]: a1 = array.array('f', b)

In [1187]: a2 = np.frombuffer(b, dtype=np.float32)

In [1189]: %timeit sum(f)
1000 loops, best of 3: 586 µs per loop

In [1189]: %timeit sum(a1)
1000 loops, best of 3: 907 µs per loop

In [1190]: %timeit a2.sum()
10000 loops, best of 3: 80.3 µs per loop

As you can see, using the array.array makes this twice as slow (I used sum because the actual iteration and arithmetic are done in C), but using np.array instead makes it 5x as fast.