Search code examples
google-cloud-platformn8n

Google Speech to Text - Not Getting the transcription | SOLVED


I am working in a workflow to transcript audio messages from WhatsApp using Google Text to Speech API and N8N.

Is very usual to get audios longer than 60 seconds, so I am using the longrunningrecognize endpoint.

I use the Baileys API to get the Audio´s base 64 via webhook, and then I send this body in the POST request to start the transcription:

{
  "config": {
    "encoding": "LINEAR16",
    "sampleRateHertz": 16000,
    "languageCode": "pt-BR"
  },
  "audio": {
    "content": "{{$node["Code1"].json.audio.content}}"
  }
}

The response for this request is the operation name. Then I make a GET request with to this URL: https://speech.googleapis.com/v1/operations/{{ $json.name }}

I get a response that indicates the operation was successful, but doesn´t gives me the transcription (see an example):

[
  {
    "name": "5289922065129560738",
    "metadata": {
      "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
      "progressPercent": 100,
      "startTime": "2025-01-17T15:02:45.872249Z",
      "lastUpdateTime": "2025-01-17T15:02:46.432154Z"
    },
    "done": true,
    "response": {
      "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
      "totalBilledTime": "3s",
      "requestId": "5289922065129560738"
    }
  }
]

The example I found in the documentation is very similar, but has more information:

{
  "name": "7612202767953098924",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2017-07-20T16:36:55.033650Z",
    "lastUpdateTime": "2017-07-20T16:37:17.158630Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "how old is the Brooklyn Bridge",
            "confidence": 0.96096134,
          }
        ]
      },
      {
        "alternatives": [
          {
            ...
          }
        ]
      }
    ]
  }
}

Can anyone help me to figure it out what´s wrong with it?

I was expecting to get my transcription

EDIT: I solved it. The issue ended up being the audio file´s conversion. I started using ffmpeg lib to convert it and now it`s all working.


Solution

  • SOLVED:

    In the end, the issue was within the audio file conversion. The file was probably getting corrupted, so Google received a file with no audio.

    I installed ffmpeg in my N8N instance and use it to make the conversion. Everything worked fine after this.