Search code examples
speech-recognitionspeech-to-textcmusphinxsphinx4

Speech Recognition using CMU Shinx, JSAPI and Google Speech API


Speech recognition is one of the many features of my current project which will be most probably developed in J2EE (other languages are also welcomed if their choice is justified).

Most of the links at google and on SO suggest the above mentioned three options, Sphinx 4, JSAPI directly and Google Speech API (making a server call to google and than getting the result as text).

What are the other freely available options for me ? And If I use Sphinx-4 how do I get the language model for general English to be used with it ?


Solution

  • Yes, there are.

    1. It is possible to use a wrapper to Google Speech Recognizer that is basic a line of code. You send speech audio in FLAC or SPEEX format and receive recognition and a confidence score. The only problem is that Google can close API as did with Google translate.
    2. Other option is to use Sphinx (Sphinx4 or Pocketsphinx).
    3. It is possible to use HTK (http://htk.eng.cam.ac.uk/) and use HVite (HTK decoder) or other like Julius (http://julius.sourceforge.jp/en/). There are other options that use HTK to train acoustic models and/or language and grammar.

    Voxforge has acoustic and language models for HTK and Sphinx (http://voxforge.org/).