I am getting a problem while trying to get the binary result using webrctvad in a wave format audio file. I am using librosa in order to load the audio file in .wav format. Can anyone tell me how to use librosa along with webrtcvad in order to get the binary output of whether the audio contains speech or not?
Webrtcvad module works correctly with the wave module
The above link helped me a lot but still, I am confused as the link contains a good explanation but during implementation lot of errors are coming.
py-webrtcvad, expects the audio data to be 16bit PCM little-endian - as is the most common storage format in WAV files.
librosa
and its underlying I/O library pysoundfile
however always returns floating point arrays in the range [-1.0, 1.0]
. To convertt this to bytes containing 16bit PCM you can use the following float_to_pcm16
function.
And I have tested to use the read_pcm16
function a direct replacement of read_wave
in the official py-webrtcvad example. But allowing to open any audio file supported by soundfile (WAV, FLAC, OGG) etc.
def float_to_pcm16(audio):
import numpy
ints = (audio * 32767).astype(numpy.int16)
little_endian = ints.astype('<u2')
buf = little_endian.tostring()
return buf
def read_pcm16(path):
import soundfile
audio, sample_rate = soundfile.read(path)
assert sample_rate in (8000, 16000, 32000, 48000)
pcm_data = float_to_pcm16(audio)
return pcm_data, sample_rate