I am working in a workflow to transcript audio messages from WhatsApp using Google Text to Speech API and N8N.
Is very usual to get audios longer than 60 seconds, so I am using the longrunningrecognize endpoint.
I use the Baileys API to get the Audio´s base 64 via webhook, and then I send this body in the POST request to start the transcription:
{
"config": {
"encoding": "LINEAR16",
"sampleRateHertz": 16000,
"languageCode": "pt-BR"
},
"audio": {
"content": "{{$node["Code1"].json.audio.content}}"
}
}
The response for this request is the operation name. Then I make a GET request with to this URL: https://speech.googleapis.com/v1/operations/{{ $json.name }}
I get a response that indicates the operation was successful, but doesn´t gives me the transcription (see an example):
[
{
"name": "5289922065129560738",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2025-01-17T15:02:45.872249Z",
"lastUpdateTime": "2025-01-17T15:02:46.432154Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"totalBilledTime": "3s",
"requestId": "5289922065129560738"
}
}
]
The example I found in the documentation is very similar, but has more information:
{
"name": "7612202767953098924",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2017-07-20T16:36:55.033650Z",
"lastUpdateTime": "2017-07-20T16:37:17.158630Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
{
"alternatives": [
{
"transcript": "how old is the Brooklyn Bridge",
"confidence": 0.96096134,
}
]
},
{
"alternatives": [
{
...
}
]
}
]
}
}
Can anyone help me to figure it out what´s wrong with it?
I was expecting to get my transcription
EDIT: I solved it. The issue ended up being the audio file´s conversion. I started using ffmpeg lib to convert it and now it`s all working.
SOLVED:
In the end, the issue was within the audio file conversion. The file was probably getting corrupted, so Google received a file with no audio.
I installed ffmpeg in my N8N instance and use it to make the conversion. Everything worked fine after this.