Search code examples
audioencodingtext-to-speechgoogle-text-to-speech

how to convert linear16 text-to-speech to audio file


I just started to play with Google Text-To-Speech API. I generated a post request to:

https://texttospeech.googleapis.com/v1/text:synthesize?fields=audioContent&key={YOUR_API_KEY}

with the following data:

{
 "input": {
  "text": "Hola esto es una prueba"
},
 "voice": {
  "languageCode": "es-419"
 },
 "audioConfig": {
  "audioEncoding": "LINEAR16",
  "speakingRate": 1,
  "pitch": 0
 }
}

and I got a 200 response, with the content:

{
    "audioContent" : "UklGRn6iCwBXQVZFZm10I...(super long string)"
}

I am assuming this is encoded (or decoded, not sure about the naming), but I would like to actually hear what is that "audioContent".


Solution

  • As Tanaike pointed out, the response is indeed Base64. To actually listen the audio, I pasted the base64 encoded string into a file, then ran:

    base64 -d audio.txt > audio.wav
    

    and that made the trick.