In order to convert speech-to-text I am using cmusphinx opensource API which convert .wav audio format into text and uses the language model for particular input speech-language
Pocket Sphinx accuracy is completely based on the model used. In order to achieve better results try to train your acoustic model according to the target user.
If you don't want to train your own model try changing various parameters of feat.params
like -cmninit
.
Moreover, try to set recognizer.setKeywordThreshold()
to as min as possible, I prefer recognizer.setKeywordThreshold(1e-40f)