Search code examples
pythongoogle-apitext-to-speechpyttsxgtts

How to use muti-language in 'gTTS' for single input line?


I want to convert text to speech from a document where multiple languages are included. When I am trying to do the following code, I fetch problems to record each language clearly. How can I save such type mixer text-audio clearly?

from gtts import gTTS
mytext = 'Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰهِ'
language = 'ar' # arabic
myobj = gTTS(text=mytext, tld='co.in', lang=language, slow=False)
myobj.save("audio.mp3")

Solution

  • It's not enough to use just text to speech, since it can work with one language only.
    To solve this problem we need to detect language for each part of the sentence.
    Then run it through text to speech and append it to our final spoken sentence.
    It would be ideal to use some neural network (there are plenty) to do this categorization for You.
    Just for a sake of proof of concept I used googletrans to detect language for each part of the sentences and gtts to make a mp3 file from it.

    It's not bullet proof, especially with arabic text. googletrans somehow detect different language code, which is not recognized by gtts. For that reason we have to use code_table to pick proper language code that works with gtts.

    Here is working example:

    from googletrans import Translator
    from gtts import gTTS
    
    input_text = "Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰه"
    words = input_text.split(" ")
    translator = Translator()
    language, sentence = None, ""
    
    lang_code_table = {"sd": "ar"}
    
    with open('output.mp3', 'wb') as ff:
        for word in words:
            if word == " ":
                continue
            # Detect language of current word
            word_language = translator.detect(word).lang
    
            if word_language == language:
                # Same language, append word to the sentence
                sentence += " " + word
            else:
                if language is None:
                    # No language set yet, initialize and continue
                    language, sentence = word_language, word
                    continue
    
                if word.endswith(("?", ".", "!")):
                    # If word endswith one of the punctuation marks, it should be part of previous sentence
                    sentence += " " + word
                    continue
    
                # We have whole previous sentence, translate it into speech and append to mp3 file
                gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)
    
                # Continue with other language
                language, sentence = word_language, word
    
        if language and sentence:
            # Append last detected sentence
            gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)
    

    It's obviously not fast and won't fit for longer text.
    Also it needs better tokenizer and proper error handling.
    Again, it's just proof of concept.