Search code examples
google-speech-api

Better Acronym Recognition with Google Speech API using Speech Adaptation


I am using the Google Speech Streaming API and would like it to recognize unusual acronyms.

I have tried adding the acronym "LHD" to the speech recognition request but it when it does recognize an acronym it's LED. It has not recognized LHD as of yet.

Is there any way to improve the recognition or better indicate that this is an acronym?

My recognition request config is:

{
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
    model: 'video',
    enableAutomaticPunctuation: true,
    speechContexts: [ {
        phrases : [
            "LHD"
        ]
    } ]
  },
  interimResults: true
}

Solution

  • You should add a "boost". As explained in Google Speech To Text docs' Fine-tune transcription results using boost :

    By default model adaptation provides a relatively small effect, especially for one-word phrases. The model adaptation boost feature allows you to increase the recognition model bias by assigning more weight to some phrases than others. We recommend that you implement boost if 1) you have already implemented model adaptation, and 2) you would like to further adjust the strength of model adaptation effects on your transcription results.

    Try changing this:

    phrases : [
            "LHD"
        ]
    

    To this:

    phrases : [
          {
            "value": "LHD",
            "boost": 10
          }
        ]
    

    At the end, you would have something like this:

    {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
    model: 'video',
    enableAutomaticPunctuation: true,
    speechContexts: [ {
        phrases : [
          {
            "value": "LHD",
            "boost": 10
          }
        ]
      } 
    ],
    interimResults: true
    

    }

    Edit: I think you must use Google\Cloud\Speech\V1p1beta1, not V1.

    Edit 2: Have a look also at using CustomClass. Here you'll see why: Cloud Speech to text documentation Supported class tokens . You can modify a class token like "$OOV_CLASS_ALPHA_SEQUENCE", that indicates you are expecting an acronym.