Search code examples
javascriptdjangogoogle-cloud-platformbase64google-text-to-speech

Google TTS in Django: Create Audio File in Javascript from base64 String


I am currently using Google's TTS Python API "synthesize_text" function in one of my Django views.

def synthesize_text(text):
    """Synthesizes speech from the input string of text."""
    from google.cloud import texttospeech
    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.types.SynthesisInput(text=text)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

    audio_config = texttospeech.types.AudioConfig(
        audio_encoding=texttospeech.enums.AudioEncoding.MP3)

    response = client.synthesize_speech(input_text, voice, audio_config)

    # The response's audio_content is binary.
    # Removing this because I do not care about writing the audio file
    # ----------------------------------------------------
    '''
    with open('output.mp3', 'wb') as out:
        out.write(response.audio_content)
        print('Audio content written to file "output.mp3"')
    '''
    # ----------------------------------------------------
    # instead return the encoded audio_content to decode and play in Javascript
    return response.audio_content


def my_view(request):
    test_audio_content = synthesize_text('Test audio.')
    return render('my_template.html', {'test_audio_content': test_audio_content})

The only change I made to the "synthesize_text" function is that I return the audio_content instead of writing it out to an audio file. This is because I don't care about storing the file, and instead just want to play it in my template using Javascript. Google claims they encode the audio_content in base64: "Cloud Text-to-Speech API allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data." So I tried creating and playing the audio file with the following code as suggested here:

<!-- my_template.html -->

<script>
var audio_content = "{{ test_audio_content }}";
var snd = new Audio("data:audio/mp3;base64," + audio_content);
console.log(snd);
snd.play();
</script>

But I get the following error:

Uncaught (in promise) DOMException: Failed to load because no supported source was found.

I logged out the audio_content, and it starts as b&#39;ÿóDÄH.. not sure if that is base64 or not. Also I tried to decode the audio_content by doing:

var decoded_content = window.atob(audio_content);

And that gave me an error as well, claiming it isn't base64.


Solution

  • From your example:

    The response's audio_content is binary

    This means that you'll need to encode the result as base64 first before you can use it:

    import base64
    ...
    return base64.b64encode(response.audio_content).decode('ascii'))
    

    Then this should work with your JS snippet exactly as you intended.