Search code examples
pythonraspberry-pispeech-recognitionmicrophonepyaudio

Adafruit I2S MEMS microphone is not working with voice activity detection system


I am trying to make a speech to text system using raspberry pi. There are many problems with VAD. I am using DeepCpeech's VAD script. Adafruit I2S MEMS microphone accepts only 32-bit PCM audio. So I modified the script to record 32-bit audio and then convert it to 16 bit for DeepSpeech's processing. Frames generation and conversation parts are below:

for frame in frames:
    if frame is not None:
        if spinner: spinner.start()
        #Get frame generated by PyAudio and Webrtcvad
        dp_frame = np.frombuffer(frame, np.int32)
        #Covert to 16-bit PCM
        dp_frame=(dp_frame>>16).astype(np.int16)
        #Convert speech to text
        stream_context.feedAudioContent(dp_frame)

PyAudio configs are:

'format': paInt32,
'channels': 1,
'rate': 16000,

When VAD is starting it is always generating non-empty frames even if there is no voice around. But When I am setting a timer for every 5 seconds it shows that the recording was done successfully. I think the problem is that the energy(voltage) adds some noise and that's why the microphone can not detect silence and end frame generation. How to solve this problem?


Solution

  • I searched for DeepCpeech's VAD script and found it. The problem is connected with the webrtcvad. The webrtcvad VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. So you need to convert the 32-bit frame to 16-bit (I am about PyAudio output frame) to process webrtcvad.is_speech(). I changed and it worked fine.