I am experimenting with google cloud TTS service and i was wondering if multi language text synthesization is supported.
Specifically i am trying to synthesize a sentence containing Greek and English words. I tried slicing the sentence to single language only parts but the voices used for each language sound quite bit off, any known workaround?
Thanks in advance
Yes. Both Google TTS and Amazon Polly allow to mix different languages-voices in the same audio sequence (Multilingual audio stream).
Amazon goes a bit further with some truly bilingual voices.
https://docs.aws.amazon.com/polly/latest/dg/bilingual-voices.html
For Google TTS, you need to use SSML and you have examples here
https://cloud.google.com/text-to-speech/docs/ssml#voice
In my experience, both (Google/Amazon) have 2 ways:
(Python)
Example of 1: (google tts)
ssml_text = '''
<speak>Here is some English text.
<lang xml:lang="es-ES">Y nos dieron las diez y la una ...</lang></speak>
'''
Example of 2: (google tts)
ssml_text = '''
<speak>And there it was
<voice name="en-GB-Wavenet-B">
a flying bird
</voice>
<voice name="es-ES-Wavenet-B">
Un chamaco muy revoltoso que no para de reirse.
</voice>
<voice name="fr-FR-Wavenet-B">
Je ne parle pas français.
</voice>
</speak>
'''
Here you can practice a bit ( https://cloud.google.com/text-to-speech ) [updated 2024-09-02] -> https://console.cloud.google.com/speech/text-to-speech
But just a bit, because if your SSML text is too complex, it gets simplified (see in the picture upper vs lower text).
Luckily Google has extensive documentation and here you can find the code to practice/experiment.