google-api google-cloud-platform speech-to-text

Test ride Speech-to-Text asynchronous operation - no results

I am trying out the long running recognize method of the Speech-to-Text API (https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/longrunningrecognize) and specified all needed parameters such as:

{
  "audio": 
  {
    "uri": "gs://xyz/blabla.mp3"
  },
  "config": 
  {
    "languageCode": "en-US",
    "encoding": "AMR_WB",
    "sampleRateHertz": 16000
  }
}

This returned a name I can use with the get operation (https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations/get).

The documentation says the "operation" JSON object returned by get would include parameters that I do not see in the response.

For example, there is no "done" node. Instead this is all I get:

{
  "name": "xxxxx",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2018-06-08T14:40:54.663240Z",
    "lastUpdateTime": "2018-06-08T15:05:01.161911Z"
  }
}

Any idea why that is? Should at least return a status and maybe an error (https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/operations#Operation)?

UPDATE: Now I am getting results. Server issues, however? Is it only a temporary glitch?

{
  "name": "xxxxx",
  "metadata": {
    "@type": "http://type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata …",
    "progressPercent": 100,
    "startTime": "2018-06-08T14:40:54.663240Z",
    "lastUpdateTime": "2018-06-08T15:05:01.161911Z"
  },
  "done": true,
  "error": {
    "code": 13,
    "message": "Server unavailable, please try again later."
  }
}

Solution

At first sight your request is mixing an unsupported mp3 format versus a supported audio encoding (AMR_WB).

Let's suppose that this mixture is ok. If you receive an empty response (a transcript is not returned and no errors have occurred), it's probably that the encoding in your file is wrong. Check some validation steps in the preceding link to determine if your sound file have troubles, for example Cloud Speech-to-Text service currently supports only one audio channel.

To narrow down your issue, you can convert your sound file following the best practices. It will be enough to transcode your file to lossless FLAC or LINEAR16 encodings with a sampling rate of 16,000 Hz or higher, however for whole recommendations please read the prior link.

The error in your last update it seems to be temporary, do you still face the issue?

If your issue persists with the new file, it could be a good idea to report this situation in their public issue tracker.

Regards!