Search code examples
iphonespeech-recognitionopenearscmusphinx

How to do Chinese speech recognition in iPhone


Can OpenEars do Chinese speech recognition? See here: http://www.politepix.com/openears


Solution

  • I'm the OpenEars developer. OpenEars only does English-language speech recognition out of the box. There is a Mandarin acoustic model in the Pocketsphinx distribution that OpenEars uses, so it might be possible to substitute it for the English acoustic model in the instructions, if you have your own method for creating a compatible language model and phonetic dictionary and you're up for doing some self-directed research and testing. The acoustic model is called tdt_sc_8k. You would use it instead of the folder in the instructions called hub4wsj_sc_8k, but there is more you'd need to do to get it working.

    If you wanted to try this, you'd want to read the Sphinx project documentation at the CMU speech site in order to get a clear understanding of the relationship between the acoustic model, the language model and the phonetic dictionary, and figure out how to create your compatible language model. You might be able to start with the phonetic dictionary on this page as a master dictionary that you can create smaller iPhone-sized phonetic dictionaries and subsequently language models from, since it is presumably compatible with the acoustic model. The language model on that page is far too large for OpenEars. For testing I would probably create a command and control model of around 100 words. You should be able to use the Sphinx Knowledge Base Tool to create the language model from a corpus of words that you've already made a phonetic dictionary from.

    The next step would be to verify your acoustic model, language model and phonetic dictionary as known-working in a conventional Pocketsphinx install, for instance on Linux. If you get good results with that, you could come over to the OpenEars forum and I will attempt to help you get it working in OpenEars (there are no guarantees there since that acoustic model has never been part of testing, but I also can't think of a particular reason that it wouldn't work). OpenEars' LanguageModelGenerator class will definitely only work with English. You are responsible for making sure that the acoustic model is licensed in a way that doesn't prevent it from being used in an App Store app if that is how you are planning to distribute your project.

    Good luck!

    EDIT: I wanted to update this to let you know that the Mandarin acoustic model is now part of OpenEarsExtras, and to say that LanguageModelGenerator has now been updated so that you can give it an arbitrary master phonetic dictionary of your choice if you have one with the correct formatting (that is, the word followed by a tab followed by the phonemes followed by a line break, alphabetized) which should make it much easier to use the dynamic language modeler with languages other than English if you already have an acoustic model.

    The way it ought to work is that you have a lookup dictionary in the language you want that is analogous to the cmu07a.dic that is the default English lookup dictionary, and LanguageModelGenerator handles the rest, so my statement about it requiring multiple steps and research should not necessarily be the case any longer if you have a phonetic dictionary that pronunciations can be looked up from. Feedback on how this works for you in practice would be very much appreciated at the OpenEars forum (please do not give feedback or bug reports here on Stack Overflow).