I am trying to replicate this article but its corresponding github repo is written quite badly. In the article, an NN is trained on manually corrupted audio signals. Unfortunately, the researchers did not add the audio files nor a clean code that show how they have corrupted their audio files. In the paper they write:
..for the noisy test set, the 100 utterances were corrupted with four unseen noise types (engine, white, street, and baby cry), at six SNR levels (-6 dB, 0 dB, 6 dB, 12 dB, 18 dB, and 24 dB); for the enhanced set, the utterances in the noisy set were enhanced by the enhancement model above.
Now to the question - is there a python (R/MATLAB libraries are fine as well) that takes as an input the signal, the type of desired noise and the SNR and returns a corrupted signal? If not, where do I get an engine or a crying baby noise types?
Thanks!
So, if someone is getting into the same problem, here is what I did. First, I looked for databases that include real-life noises. Most of them costs money and offer limited variety of environments (see the AURORA-2 corpus, the CHiME background noise data, or the NOISEX-92 database). Finally I found the DEMAND dataset that includes multi-channel noises from 16 different environments (office, car, road, etc.) and is available freely.
Now, before merging noise and signal, one has to verify they share the same sampling rate (actually, it is not such a severe problem as I understand from this discussion, but it is better to be on the safe side). If you are using python, you can use the librosa.resample
module to standardize the two. After that, you can add the two signals. When adding the noise, you may want to control the magnitude of each of inputs (signal and noise). You can use the signal to noise ratio formula that is given below in order to find $a$, the multiplayer by which you have to multiply your noise in order to get the desired signal to noise ratio (SNR
).
Where the desired SNR
is given, and the two RMS
are calculated from your data.