I made it, finally. My WER (word error rate) is at 0 % after training. I have just a small dataset for simple voice recognoition (just for the words "yes" and "no" in another language). I trained with sphinxtrain (126 train files, 12 test files). The audiofiles have a length of ~5s and contains 8 words (mixed yes/no).
After training i decided to take my testfiles an run them through pocketsphinx. Nearly every file i tested had at least 1 word error. Sometimes it recognized 1-2 more words than expected. Sometimes it recognized a "yes" as a "no".
I'd like to know why im getting different results from sphinxtrain and pocketsphinx.
You do not have enough training data.
I'd also like to know how i can improve my results using pocketsphinx. (Especially the thing that pocketsphinx recognize one "no" as two "no"s.
Use more training data.