I'm running PocketSphinx on Android (version 5prealpha). I'm using a file-defined keyword recognizer, specified by the following snippet (kwfile
is the keyword definition file, and mRecognizer
is an instance of SpeechRecognizer):
mRecognizer.addKeywordSearch(DESCRIPTOR, kwfile);
Overall, the recognition performance is pretty good, after having optimized the keyword thresholds. However, if I wait some arbitrary amount of time (5 sec up to several minutes) between one keyword utterance and the next, the recognition performance suffers on the second utterance. For example, I'll say "keyword," and it will be recognized. If I wait less than 5 sec and say "keyword" again, the second utterance will likely be recognized (recognition rate over 95%). If, however, I wait 15 sec, the recognition rate drops dramatically, to less than 50%.
My hypothesis is that when I say the keyword the second time, the recognizer is in the middle of a refresh - that is it's between a Stop Recognition
event and a Start Recognition
event, and that my speech transcends that event. Here is a typical view of my logcat. Notice that after 5 sec, the recognizer "refreshes". This happens about every 5 sec, for the most part. Sometimes it can be as long as 30 sec between "refreshes", but generally it's around 5 sec.
09-26 07:11:06.800 20397-20397/...﹕ Start recognition "kwfile"
09-26 07:11:06.815 20397-23642/...﹕ Starting decoding
09-26 07:11:11.310 20397-20397/...﹕ Stop recognition
09-26 07:11:11.315 20397-20397/...﹕ Start recognition "kwfile"
09-26 07:11:11.360 20397-23645/...﹕ Starting decoding
09-26 07:11:17.405 20397-20397/...﹕ Stop recognition
So, my question is: Is there anything I can do to control this "refresh rate"? Is this caused by something I'm doing wrong in my RecognitionListener
implementation (see below, but note - I typically don't get any partial results between utterances.)? Or is there a PocketSphinx API call that I don't know about to set this refresh rate? Or, is there something I could change in the PocketSphinx source to improve this behavior?
class VoiceListener implements RecognitionListener{
private boolean isCommand = false;
@Override
public void onBeginningOfSpeech() {
Log.d(TAG,"Beginning of Speech");
// do nothing
}
@Override
public void onEndOfSpeech() {
Log.d(TAG,"End of Speech");
// do nothing
}
@Override
public void onPartialResult(Hypothesis arg0) {
if( arg0 != null){
Log.d(TAG, "Partial results list: " + arg0.getHypstr());
isCommand = false;
// handle recognition results for keywords
for( String command : this.getCurrentCommands() ) {
if (arg0.getHypstr().contains(command)) {
this.onRecognition(arg0.getHypStr());
isCommand = true;
mRecognizer.stop();
}
}
// call stop, and let onResults() handle grammar results
if( arg0.getHypstr().contains(Command.STOP_WORD))
mRecognizer.stop();
}
}
@Override
public void onResult(Hypothesis results) {
String data;
if( results == null ){
data = null;
}else{
data = results.getHypstr();
}
Log.d(TAG,"Final results: " + data );
// handle grammar recognition results
if( !isCommand ){
this.onRecognition(data);
}
return;
}
There is no such thing as "refresh rate". Recognition accuracy drops probably because you have some noise on the background and it is not properly filtered out. You can study raw dumps to investigate if silence is counted as speech. You can share raw audio dumps to get help on this issue.
In your code there are things which are not very reasonable. If you are using keyword spotting only, there is no need to stop and restart the recognizer in onEndOfSpeech as you are doing now, you could just skip it. In spotting mode you do not need to wait for the end of speech to get a result, you can just use partial result to invoke actions and restart recognizer.