Search code examples
javascriptspeech-recognitionwebkitspeechrecognition

Webkit Speech Recognition API: Single Syllables


I'm trying to use Webkit Speech Recognition API to recognize single syllables, rather than full words or sentences. As this API requires "grammar" definition, I wonder if there is a way to implement single syllable recognition. Something like "ah" or "bi".

Thanks


Solution

  • Unfortunately, this isn't possible with the Web Speech API. Although you can create custom grammars (which are collections of words), you can't define custom dictionaries or vocabularies (which are the words themselves). In your case, you'll need to define a custom vocabulary that includes individual phonemes as the words, and then limit your grammar to only choose words from your custom vocabulary. There are a few paid cloud-based services that will allow you to do this.

    For example, using IBM Watson, you could create a custom language model and then add words to the model (in your case, each phoneme would be a "word"). Normally, a custom language model is blended with a general language model, but you wouldn't want that, so you would set the customization weight to 1.0 (meaning it would only use your custom language model).

    There are other ways you could go about it too, but I doubt you'd find a purely web-based solution that doesn't involve a paid service. If you're able to move to a native platform (or create your own web-based service on the server), then you have a few more options. For example, CMUSphinx would allow you to create a custom dictionary to use with Sphinx4 on the server or PocketSphinx on mobile. Although CMUSphinx isn't the most accurate system for large-vocabulary applications, your custom vocabulary would be tiny, so CMUSphinx would perform very well.