Search code examples
iosspeech-recognitionvoice-recognitionopenears

Any difference between static & dynamic language model in OpenEars?


I have been trying to create a game using OpenEars. I am using dynamic language model method. But the performance is not up to the mark. Recognition is very low. Is there any advantage in using static language model ?? Any other method to improve speech recognition ???


Solution

  • OpenEars developer here. There are a few things that can result in sub-par recognition. Here are the really big ones:

    • Testing recognition using the Simulator rather than a real device
    • Having misspelled words in your language model (this is a big one that accounts for a very large number of reported issues – if the word is misspelled, how will its correct pronunciation be found or derived and entered in the phonetic dictionary? It can't be, and then correct pronunciations get false negatives)
    • Having extraneous punctuation in your language model. Check this out by taking a look at the .arpa file contents and the .dic file contents and seeing if the entries in each seem to match each other or not.
    • Having a native-speaker accent which is very different from the US accents the acoustic model is trained with, or having a non-native-speaker accent (this isn't fair, but it's reality)
    • Having the language model largely consist of non-English words such as non-English last names, non-English street names, or intentionally-misspelled band/startup names, since all pronunciation ends up being estimated.

    But static language models versus dynamic language models have never been a big consideration for accuracy levels. If you'd like to troubleshoot this with me further I'd recommend that you visit the OpenEars support forums where I'd be happy to help with that, since Stack Overflow is not intended for ongoing back-and-forth troubleshooting processes and this is probably one of those.