Search code examples
pythonbuffermultiprocessingpyaudio

python pyaudio using multiprocessing


I'm trying to grab samples from a stream of audio and put them in a shared Queue. I have another process that pulls from this queue.

When I run, I get this error:

* recording
Traceback (most recent call last):
  File "record.py", line 43, in <module>
    data = stream.read(CHUNK)
  File "/Library/Python/2.7/site-packages/pyaudio.py", line 605, in read
    return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981

EDIT: Apparently problem has been around for a while with no solution posted (I tried their suggestions):

Here's (simplified) code:

import pyaudio
import wave
import array
import time
from multiprocessing import Queue, Process

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 2

p = pyaudio.PyAudio()
left = Queue()
right = Queue()

def other(q1, q2):
    while True: 
        try:
                a = q1.get(False)
        except Exception:
            pass

        try:
                b = q2.get(False)
        except Exception:
            pass

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")
Process(target=other, args=(left, right)).start()

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    byte_string = ''.join(data)
    nums = array.array('h', byte_string)
    for elt in nums[1::2]:
        left.put(elt)
    for elt in nums[0::2]:
        right.put(elt)

print("* done recording")

stream.stop_stream()
stream.close()
print "terminated"

What am I doing wrong? I'm on Mac OSX and Python 2.7, I installed portaudio through homebrew and tried both the pip and dmg installation of `pyaudio with no luck with either.


Solution

  • The buffer overflow error is because your frames_per_buffer and read chunksize might be too small. Try larger values for them i.e. 512,2048, 4096, 8192 etc

    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 44100
    RECORD_SECONDS = 2
    
    for CHUNK1 in [512,2048,4096,8192,16384]:
        for CHUNK2 in [512,2048,4096,8192,16384]:
            stream = p.open(format=FORMAT,
                            channels=CHANNELS,
                            rate=RATE,
                            input=True,
                            frames_per_buffer=CHUNK1)
    
    
            try:
                print CHUNK1,CHUNK2
                for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
                    data = stream.read(CHUNK2)
            except:
                print "Boohoo"
    
            stream.stop_stream()
            stream.close()
    

    Update

    ok i think I got it. the pyaudio library will raise an overflow error if you wait too long before your next read.

    byte_string = ''.join(data)
    nums = array.array('h', byte_string)
    for elt in nums[1::2]:
        left.put(elt)
    for elt in nums[0::2]:
        right.put(elt)
    

    this does a whole lot of very slow processing here. especially the two for loops which are in python. let the processing process get a whole chunk of mono-stream data which it can deal with instead of one int at a time.

    import numpy as np
    

    ...

    n=np.fromstring(data,np.uint16)
    left.put(n[1::2])
    right.put(n[0::2])
    

    I don't even want to imagine what the for loops were doing to latency, but even the relatively minor performance improvement between using array and np.array are worth noting:

    a=array.array('h',s)
    n=np.array(a)
    
    In [26]: %timeit n[1::2]
    1000000 loops, best of 3: 669 ns per loop
    
    In [27]: %timeit n[1::2].copy()
    1000 loops, best of 3: 725 us per loop
    
    In [28]: %timeit a[1::2]
    1000 loops, best of 3: 1.91 ms per loop