Search code examples

How to use Google's Text-to-Speech API in Python

My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL

One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them? I just want my list of strings returned as audio files.

My Code

(I put my actual key in the block above. I'm just not going to share it here.)


  • Configure Python App for JSON file and Install Client Library

    1. Create a Service Account
    2. Create a Service Account Key using the Service Account here
    3. The JSON file downloads and save it securely
    4. Include the Google Application Credentials in your Python App
    5. Install the library: pip install --upgrade google-cloud-texttospeech

    Using Google's Python examples found: Note: In Google's example it is not including the name parameter correctly. and

    Below is the modified from the example using google app credentials and wavenet voice of a female.

    from import texttospeech
    # Instantiates a client
    client = texttospeech.TextToSpeechClient()
    # Set the text input to be synthesized
    synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")
    # Build the voice request, select the language code ("en-US") 
    # ****** the NAME
    # and the ssml voice gender ("neutral")
    voice = texttospeech.types.VoiceSelectionParams(
    # Select the type of audio file you want returned
    audio_config = texttospeech.types.AudioConfig(
    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(synthesis_input, voice, audio_config)
    # The response's audio_content is binary.
    with open('output.mp3', 'wb') as out:
        # Write the response to the output file.
        print('Audio content written to file "output.mp3"')

    Voices,Name, Language Code, SSML Gender, Etc

    List of Voices:

    In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.

    voice = texttospeech.types.VoiceSelectionParams(