Search code examples
speech-recognitionnoisenoise-reductionpocketsphinx-android

Noise reduction before pocketsphinx reduces recognition accuracy


I am trying to improve the recognition accuracy of pocketsphinx in noisy environments. However the user might use the app in a variable environment. Hence training with noise is not something that I want to do.

My question is , would noise reduction before feeding in the speech signal to pocketsphinx necessarily reduce recognition accuracy?

If yes, what features of speech need to be retained after noise reduction? Currently I observe that the WER goes up from ~40%(free form language) to ~60% if I use noise reduction.

Just to add, the speech does sound better perceptually after noise reduction.

Pocketsphinx argfile:

-lm   lm_giga_64k_vp_3gram.DMP
-dict lm_giga_64k_vp.sphinx.dic 
-hmm  voxforge_en_sphinx.cd_cont_5000

The idea here is to demonstrate increase in speech recognition accuracy with noise reduction enabled and intuitively this should ideally happen unless the noise reduction algorithm is completely messing up the spectral content of the signal.

Any help would be appreciated.


Solution

  • Currently I observe that the WER goes up from ~40%(free form language) to ~60% if I use noise reduction.

    Those are very bad rates because:

    1) You are using outdated models

    2) You are using outdated pocketsphinx without noise reduction.

    External noise reduction usually degrades speech recognition accuracy, luckily latest pocketsphinx has it's own noise reduction module which makes it quite robust to noise. You just need to update. To get best results you need to:

    1) Download and use latest sphinxbase and pocketsphinx from http://github.com/cmusphinx

    2) Download latest acoustic and language model:

    http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download

    http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download

    That would allow you to set a proper baseline. To experiment with noise reduction on and off you can use command line config option:

    -remove_noise yes/no
    

    For the further advice on how to reduce the accuracy including the noise-robustness you should better provide a test sample of the audio you want to recognize. See for details:

    http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor