Python Librosa with Microphone input

So I am trying to get librosa to work with a microphone input instead of just a wav file and have been running to a few problems. Initially I use the pyaudio library to connect to the microphone but I am having trouble translating this data for librosa to use. Any suggestions on how this should be approached, or is it even possible?

A few things I tried include receiving data from pyaudio mic, decode it into an array of floats and pass it to librosa (as from the docs, this is what librosa does with wav files with .load), but it doesn't work as it produces the following error: "librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere"


FORMAT = pyaudio.paInt16
RATE = 44100
CHUNK = 2048
WIDTH = 2
CHANNELS = 2
RECORD_SECONDS = 5

stream = audio.open(format=FORMAT,
                    channels = CHANNELS,
                    rate = RATE,
                    input=True,
                    output=True,
                    frames_per_buffer=CHUNK)
while True:
        data = stream.read(CHUNK)
        data_float = np.fromstring(data , dtype=np.float16)
        data_np = np.array(data_float , dtype='d')
        # data in 1D array
        mfcc = librosa.feature.mfcc(data_np.flatten() , 44100)
        print(mfcc)

Solution

You can do it using callback function from pyaudio. I think it's easier using a class.

In the constructor __init__ you define all the constant you need and you set the FORMAT to pyaudio.paFloat32 that will enable you later to use it with librosa.

Then in the start method I open the audio stream. The stream_callback parameters in the .open() let you specify the way you want to implement your function.

callback method take as argument in_data, frame_count, time_info, flag then you receive the in_data in binaries. So you need to use np.frombuffer(in_data, dtype=np.float32) to convert them into a numpy array.

Once this is done you can use your numpy.ndarray as you normally would with librosa

I think this can be optimized, but this solution works fine for me, hoping it helps :)

import numpy as np
import pyaudio
import time
import librosa

class AudioHandler(object):
    def __init__(self):
        self.FORMAT = pyaudio.paFloat32
        self.CHANNELS = 1
        self.RATE = 44100
        self.CHUNK = 1024 * 2
        self.p = None
        self.stream = None

    def start(self):
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format=self.FORMAT,
                                  channels=self.CHANNELS,
                                  rate=self.RATE,
                                  input=True,
                                  output=False,
                                  stream_callback=self.callback,
                                  frames_per_buffer=self.CHUNK)

    def stop(self):
        self.stream.close()
        self.p.terminate()

    def callback(self, in_data, frame_count, time_info, flag):
        numpy_array = np.frombuffer(in_data, dtype=np.float32)
        librosa.feature.mfcc(numpy_array)
        return None, pyaudio.paContinue

    def mainloop(self):
        while (self.stream.is_active()): # if using button you can set self.stream to 0 (self.stream = 0), otherwise you can use a stop condition
            time.sleep(2.0)


audio = AudioHandler()
audio.start()     # open the the stream
audio.mainloop()  # main operations with librosa
audio.stop()