Search code examples
pythongoogle-translategoogle-translation-apigoogletrans

How to tell Googletrans to ignore certain parts?


I would like to use googletrans to make use of the Google translate API. However, there are strings where are variable names in it:

User "%(first_name)s %(last_name)s (%(email)s)" has been deleted.

If I use this via googletrans I get

from googletrans import Translator
translator = Translator()
translator.translate(u'User "%(first_name)s %(last_name)s (%(email)s)" has been assigned.', src='en', dest='fr').text

I get the following:

L'utilisateur "% (first_name) s% (last_name) s (% (email) s)" a été affecté.

However, the "%(first_name) s% (last_name)s (%(email)s)" has some strings introduced. Is there a way around this? I've already tried:

u'User "<span class="notranslate">%(first_name)s %(last_name)s (%(email)s)</span>" has been assigned.'

Solution

  • It seems Googletrans leaves, e.g., __1__ untouched. So you can replace %(first_name)s with __0__, %(last_name)s with __1__, etc. before you translate, and then restore the variables afterwards. Here code to do this:

    from googletrans import Translator
    import re
    
    translator = Translator()
    txtorig = u'User "%(first_name)s %(last_name)s (%(email)s)" has been assigned.'
    
    # temporarily replace variables of format "%(example_name)s" with "__n__" to
    #  protect them during translate()
    VAR, REPL = re.compile(r'%\(\w+\)s'), re.compile(r'__(\d+)__')
    varlist = []
    def replace(matchobj):
      varlist.append(matchobj.group())
      return "__%d__" %(len(varlist)-1)
    def restore(matchobj):
      return varlist[int(matchobj.group(1))]
    
    txtorig = VAR.sub(replace, txtorig)
    txttrans = translator.translate(txtorig, src='en', dest='fr').text
    txttrans = REPL.sub(restore, txttrans)
    
    print(txttrans)
    

    Here the result:

    L'utilisateur "%(first_name)s %(last_name)s (%(email)s)" a été attribué.