Search code examples
google-cloud-speech

Google cloud speech to text - How to get numbers in digit


I'm using the google cloud speech to text api, it's working great!

As for the response, I'd need to get numbers in digits (1,2,3) instead of text (one, two, three).

I notice that if the number is placed inside the sentence, it's in text format.

Is there a parameter for this?

Thanks!


Solution

  • Depending on the context of your input, you can definitely convert numbers in text format to actual numbers. You can include speechContexts on your config. A class token can be assigned to the phrases field. To better explain this here is an example taken from the speech context documentation.

    For example, to improve the transcription of address numbers from your source audio, provide the value $ADDRESSNUM in your SpeechContext object.

    The config with speechContexts will look like this.

      "config": {
        "encoding":"LINEAR16",
        "sampleRateHertz": 8000,
        "languageCode":"en-US",
        "speechContexts": [{
          "phrases": ["$ADDRESSNUM"]
         }]
      }
    

    The $ADDRESSNUM is an example of a class token. What it does is when the context of the speech is about addresses, it converts the digits in words to actual digits.

    enter image description here

    There are a lot of other class tokens that are available. You can further look into the class tokens in this document.