Search code examples
google-cloud-speech

How to get Google Cloud Speech (voice-to-text) to recognize letters and sounds


Is there a way to get the Google Cloud Speech API to recognize letters and letter sounds?

As an example use case, if I wanted to build a spelling game where a voice would say, "Spell restaurant" and the recognizer would listen for each letter and recognize them as they come through.

Similarly, is there a way to identify specific letter sounds like: "oo", "ew", "k" (as in cat) or "s" (as in circle).


Solution

  • It seems to already do a reasonable job at least in some cases. E.g., when I spell out "cee ay tee" it recognizes "c a t". It is also possible to provide "word hints" as described in this post:

    Google Cloud Speech API word Hints

    Supplying a list of single-letter "words" as hints, i.e.

    phrases = ['a', 'b', 'c', 'd' ... ]
    

    appears to give improved results in this area.