Search code examples
google-cloud-platformgoogle-apigcloudspeech-to-text

Can I specify the model (e.g. "video") in the Google Cloud Speech-to-Text api when using the gcloud tool?


Google's speech-to-text service has several possible models to use for transcribing speech to text (standard, video, phone call, etc). Google provides documentation here on using these models when sending requests to their speech-to-text api from Python or via curl. But I am using gcloud ml speech recognize to make requests to that API, and want to be able to specify the model to use. I've read pages and pages of documentation to figure this out, but no luck yet.

My command-line script:

gcloud ml speech recognize test.wav --language-code=EN --useEnhanced=true

I've also tried --model=video instead of --useEnhanced=true.

Google's response:

ERROR: (gcloud.ml.speech.recognize) unrecognized arguments: --useEnhanced=true

To search the help text of gcloud commands, run:
  gcloud help -- SEARCH_TERMS

Please help! Thanks :)


Solution

  • I wasn't able to get it working with the gcloud tool but I was able to do it "manually" with cURL. Follow the docs here: https://cloud.google.com/speech-to-text/docs/quickstart-protocol. Make sure to create a service account with the proper role, download the resulting private key, and run export GOOGLE_APPLICATION_CREDENTIALS=path-to-credentials.json. Then create a JSON file with your request. Mine looked like this:

    {
        "config": {
            "languageCode": "en-US",
            "useEnhanced": true,
            "model": "video"
        },
        "audio": {
            "uri": "gs://bucket/audio.flac"
        }
      }
    

    Then just execute the cURL command the docs suggest for the recognize endpoint (taking care to change the file name to the JSON you created) and you should be good to go.

    Here are the docs for the recognize endpoint: https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize. You can click-through to the RecognitionConfig and RecognitionAudio objects to see what you can include in your JSON file.