I am using pyaudio in callback mode with paFloat32
format, 2 channels, 1024 frames per buffer, and I am interested in a more efficient input/output audio buffer data exchange.
To unpack an input audio buffer and get the list of float samples, I use:
fmt = str( N_CHANNELS * BUFFER_SIZE ) + 'f'
in_floats = struct.unpack( fmt, in_data )
Using struct.pack()
and struct.unpack()
is quite inefficient and it takes significant CPU resources, almost the same as the audio signal processing itself. Since most sound cards are 16 bit, I also tried to use the paInt16
format, but the results are almost identical.
What would be the most efficient format and pack/unpack method to use in callback mode (of course maintaining full resolution)?
Edit: PyAudio exchanges data using binary streams or buffers similar to the C data structures used with Portaudio. I need to unpack the in_data
input buffer to get the float samples and analyze them. Everyting is OK, except the unpack is a bit slow.
Using either NumPy or the stdlib's array
module is going to be much faster, because most of the cost of struct.unpack
isn't the unpacking, it's the boxing up of each float value in a Python float
object.
For example:
In [1177]: f = [random.random() for _ in range(65536)]
In [1178]: b = struct.pack('65536f', *f)
In [1179]: %timeit struct.unpack('65536f', b)
1000 loops, best of 3: 1.61 ms per loop
In [1180]: %timeit array.array('f', b)
100000 loops, best of 3: 17.7 µs per loop
That's 100x as fast. And you've got an iterable of floats either way, it's just that it's an array
instead of a tuple
.
However, if you're planning to actually do any arithmetic on these values, you're still going to have to iterate those values—and array
will have to unbox each of them as you do so, which is going to add back in a good chunk of the time you've saved.
That's where NumPy comes in; I doubt np.frombuffer(b, dtype=np.float32)
is going to be hugely faster than array.array('f', b)
to create, but it will allow you to do vectorized arithmetic directly on the unboxed values. For example:
In [1186]: a1 = array.array('f', b)
In [1187]: a2 = np.frombuffer(b, dtype=np.float32)
In [1189]: %timeit sum(f)
1000 loops, best of 3: 586 µs per loop
In [1189]: %timeit sum(a1)
1000 loops, best of 3: 907 µs per loop
In [1190]: %timeit a2.sum()
10000 loops, best of 3: 80.3 µs per loop
As you can see, using the array.array
makes this twice as slow (I used sum
because the actual iteration and arithmetic are done in C), but using np.array
instead makes it 5x as fast.