Search code examples
pythonaudiomicrophonepyaudiolibrosa

Python Librosa with Microphone input


So I am trying to get librosa to work with a microphone input instead of just a wav file and have been running to a few problems. Initially I use the pyaudio library to connect to the microphone but I am having trouble translating this data for librosa to use. Any suggestions on how this should be approached, or is it even possible?

A few things I tried include receiving data from pyaudio mic, decode it into an array of floats and pass it to librosa (as from the docs, this is what librosa does with wav files with .load), but it doesn't work as it produces the following error: "librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere"


FORMAT = pyaudio.paInt16
RATE = 44100
CHUNK = 2048
WIDTH = 2
CHANNELS = 2
RECORD_SECONDS = 5

stream = audio.open(format=FORMAT,
                    channels = CHANNELS,
                    rate = RATE,
                    input=True,
                    output=True,
                    frames_per_buffer=CHUNK)
while True:
        data = stream.read(CHUNK)
        data_float = np.fromstring(data , dtype=np.float16)
        data_np = np.array(data_float , dtype='d')
        # data in 1D array
        mfcc = librosa.feature.mfcc(data_np.flatten() , 44100)
        print(mfcc)


Solution

  • You can do it using callback function from pyaudio. I think it's easier using a class.

    In the constructor __init__ you define all the constant you need and you set the FORMAT to pyaudio.paFloat32 that will enable you later to use it with librosa.

    Then in the start method I open the audio stream. The stream_callback parameters in the .open() let you specify the way you want to implement your function.

    callback method take as argument in_data, frame_count, time_info, flag then you receive the in_data in binaries. So you need to use np.frombuffer(in_data, dtype=np.float32) to convert them into a numpy array.

    Once this is done you can use your numpy.ndarray as you normally would with librosa

    I think this can be optimized, but this solution works fine for me, hoping it helps :)

    import numpy as np
    import pyaudio
    import time
    import librosa
    
    class AudioHandler(object):
        def __init__(self):
            self.FORMAT = pyaudio.paFloat32
            self.CHANNELS = 1
            self.RATE = 44100
            self.CHUNK = 1024 * 2
            self.p = None
            self.stream = None
    
        def start(self):
            self.p = pyaudio.PyAudio()
            self.stream = self.p.open(format=self.FORMAT,
                                      channels=self.CHANNELS,
                                      rate=self.RATE,
                                      input=True,
                                      output=False,
                                      stream_callback=self.callback,
                                      frames_per_buffer=self.CHUNK)
    
        def stop(self):
            self.stream.close()
            self.p.terminate()
    
        def callback(self, in_data, frame_count, time_info, flag):
            numpy_array = np.frombuffer(in_data, dtype=np.float32)
            librosa.feature.mfcc(numpy_array)
            return None, pyaudio.paContinue
    
        def mainloop(self):
            while (self.stream.is_active()): # if using button you can set self.stream to 0 (self.stream = 0), otherwise you can use a stop condition
                time.sleep(2.0)
    
    
    audio = AudioHandler()
    audio.start()     # open the the stream
    audio.mainloop()  # main operations with librosa
    audio.stop()