I just started to play with Google Text-To-Speech API. I generated a post request to:
https://texttospeech.googleapis.com/v1/text:synthesize?fields=audioContent&key={YOUR_API_KEY}
with the following data:
{
"input": {
"text": "Hola esto es una prueba"
},
"voice": {
"languageCode": "es-419"
},
"audioConfig": {
"audioEncoding": "LINEAR16",
"speakingRate": 1,
"pitch": 0
}
}
and I got a 200 response, with the content:
{
"audioContent" : "UklGRn6iCwBXQVZFZm10I...(super long string)"
}
I am assuming this is encoded (or decoded, not sure about the naming), but I would like to actually hear what is that "audioContent".
As Tanaike pointed out, the response is indeed Base64. To actually listen the audio, I pasted the base64 encoded string into a file, then ran:
base64 -d audio.txt > audio.wav
and that made the trick.