Search code examples
google-cloud-speech

Multilanguage Google TTS sentence


I am experimenting with google cloud TTS service and i was wondering if multi language text synthesization is supported.

Specifically i am trying to synthesize a sentence containing Greek and English words. I tried slicing the sentence to single language only parts but the voices used for each language sound quite bit off, any known workaround?

Thanks in advance


Solution

  • Yes. Both Google TTS and Amazon Polly allow to mix different languages-voices in the same audio sequence (Multilingual audio stream).

    Amazon goes a bit further with some truly bilingual voices.

    https://docs.aws.amazon.com/polly/latest/dg/bilingual-voices.html

    For Google TTS, you need to use SSML and you have examples here

    https://cloud.google.com/text-to-speech/docs/ssml#voice


    In my experience, both (Google/Amazon) have 2 ways:

    1. The native voice in language A tries to speak language B with 'some degree' of skill, but you can really feel the A-accent.
    2. Switch voices, and speak each part with a native voice. In this way you do have 2 different voices, but each one speaks perfectly, all the in the same audio output.

    (Python)

    Example of 1: (google tts)

    ssml_text = '''
    <speak>Here is some English text.
    <lang xml:lang="es-ES">Y nos dieron las diez y la una ...</lang></speak>
    '''
    

    Example of 2: (google tts)

    ssml_text = '''
    <speak>And there it was 
    
    <voice name="en-GB-Wavenet-B">
    a flying bird 
    </voice>
    
    <voice  name="es-ES-Wavenet-B">
    Un chamaco muy revoltoso que no para de reirse.
    </voice>
    
    <voice  name="fr-FR-Wavenet-B">
    Je ne parle pas français.
    </voice>
    
    </speak>
    '''
    

    Here you can practice a bit https://cloud.google.com/text-to-speech

    But just a bit, because if your SSML text is too complex, it gets simplified (see in the picture upper vs lower text).

    enter image description here


    Luckily Google has extensive documentation and here you can find the code to practice/experiment.

    https://cloud.google.com/text-to-speech/docs/ssml-tutorial