Search code examples
speech-recognitionspeech-to-textcmusphinxibm-watsongoogle-speech-api

Is there any speech recognition API besides Google that returns interim results?


I am looking for a speech recognition API that returns interim results as the user is speaking, similarly to what Google does on its homepage (https://www.google.com). I am looking for an API that supports French. What I want to do is to create a web application that works similarly to Google vocal search.

  • Google Speech API is not recommended for professional development, since it changes often and is not completely documented.
  • IBM Watson doesn't support French
  • AT&T Speech API doesn't return interim results
  • CMU Sphinx returns incredibly bad results (see a demo here: http://syl22-00.github.io/pocketsphinx.js/live-demo.html)
  • Nuance products don't seem to be made for a web application. (if you know what should I do to use them, I am interested!)

Solution

  • Microsoft's Project Oxford Speech Recognition API, used by Cortana and Skype Translator, meets both of your criteria: it supports French (and 6 other languages) and returns partial/interim/online hypotheses as you stream audio to it.

    (As an aside, the usual problem that causes terrible accuracy when doing online recognition with Pocketsphinx is bad CMN (cepstral mean normalization). When you give pocketsphinx a complete piece of audio to process it computes the CMN over the entire utterance, but when you stream audio to it it does not by default compute the CMN. One solution is to give it a complete utterance, retrieve the CMN computed by pocketsphinx, then use that CMN for the streaming audio. Note that CMN is different for each audio channel/environment, and that the Python interface to pocketsphinx doesn't offer an interface to CMN data. I have a patch if this is a route you'd like to investigate.)