python google-api text-to-speech pyttsx gtts

How to use muti-language in 'gTTS' for single input line?

I want to convert text to speech from a document where multiple languages are included. When I am trying to do the following code, I fetch problems to record each language clearly. How can I save such type mixer text-audio clearly?

from gtts import gTTS
mytext = 'Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰهِ'
language = 'ar' # arabic
myobj = gTTS(text=mytext, tld='co.in', lang=language, slow=False)
myobj.save("audio.mp3")

Solution

It's not enough to use just text to speech, since it can work with one language only.
To solve this problem we need to detect language for each part of the sentence.
Then run it through text to speech and append it to our final spoken sentence.
It would be ideal to use some neural network (there are plenty) to do this categorization for You.
Just for a sake of proof of concept I used googletrans to detect language for each part of the sentences and gtts to make a mp3 file from it.

It's not bullet proof, especially with arabic text. googletrans somehow detect different language code, which is not recognized by gtts. For that reason we have to use code_table to pick proper language code that works with gtts.

Here is working example:

from googletrans import Translator
from gtts import gTTS

input_text = "Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰه"
words = input_text.split(" ")
translator = Translator()
language, sentence = None, ""

lang_code_table = {"sd": "ar"}

with open('output.mp3', 'wb') as ff:
    for word in words:
        if word == " ":
            continue
        # Detect language of current word
        word_language = translator.detect(word).lang

        if word_language == language:
            # Same language, append word to the sentence
            sentence += " " + word
        else:
            if language is None:
                # No language set yet, initialize and continue
                language, sentence = word_language, word
                continue

            if word.endswith(("?", ".", "!")):
                # If word endswith one of the punctuation marks, it should be part of previous sentence
                sentence += " " + word
                continue

            # We have whole previous sentence, translate it into speech and append to mp3 file
            gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)

            # Continue with other language
            language, sentence = word_language, word

    if language and sentence:
        # Append last detected sentence
        gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)

It's obviously not fast and won't fit for longer text.
Also it needs better tokenizer and proper error handling.
Again, it's just proof of concept.