Multilanguage Google TTS sentence

I am experimenting with google cloud TTS service and i was wondering if multi language text synthesization is supported.

Specifically i am trying to synthesize a sentence containing Greek and English words. I tried slicing the sentence to single language only parts but the voices used for each language sound quite bit off, any known workaround?

Thanks in advance

Solution

Yes. Both Google TTS and Amazon Polly allow to mix different languages-voices in the same audio sequence (Multilingual audio stream).

Amazon goes a bit further with some truly bilingual voices.

https://docs.aws.amazon.com/polly/latest/dg/bilingual-voices.html

For Google TTS, you need to use SSML and you have examples here

https://cloud.google.com/text-to-speech/docs/ssml#voice

In my experience, both (Google/Amazon) have 2 ways:

The native voice in language A tries to speak language B with 'some degree' of skill, but you can really feel the A-accent.
Switch voices, and speak each part with a native voice. In this way you do have 2 different voices, but each one speaks perfectly, all the in the same audio output.

(Python)

Example of 1: (google tts)

ssml_text = '''
<speak>Here is some English text.
<lang xml:lang="es-ES">Y nos dieron las diez y la una ...</lang></speak>
'''

Example of 2: (google tts)

ssml_text = '''
<speak>And there it was 

<voice name="en-GB-Wavenet-B">
a flying bird 
</voice>

<voice  name="es-ES-Wavenet-B">
Un chamaco muy revoltoso que no para de reirse.
</voice>

<voice  name="fr-FR-Wavenet-B">
Je ne parle pas français.
</voice>

</speak>
'''

Here you can practice a bit ( https://cloud.google.com/text-to-speech ) [updated 2024-09-02] -> https://console.cloud.google.com/speech/text-to-speech

But just a bit, because if your SSML text is too complex, it gets simplified (see in the picture upper vs lower text).

Luckily Google has extensive documentation and here you can find the code to practice/experiment.

https://cloud.google.com/text-to-speech/docs/ssml-tutorial