Search code examples
pythongoogle-cloud-platformetlgoogle-translate

google translate API bottleneck


I'm currently working on an ETL pipeline and it takes too long to run after checking which part of the code takes the longest I found this: I'm using the Google Cloud Translate API to translate keywords that don't have translations in my db, but I'm running into a bottleneck when I try to translate a large number of keywords. Here's the code I'm using:

from google.cloud import translate_v2 as gt

gt_client = gt.Client(target_language="de")
for keywd in no_translations:
    keywd_translated[keywd] = gt_client.translate(keywd)["translatedText"]
    if keywd_translated[keywd] == "":
        keywd_translated[keywd] = keywd

The problem is that this code is taking a long time to execute when there are a lot of keywords to translate (10min out of 13 min is consumed by this part). Is there a way to optimize this code or the use of the API to make it faster? Any suggestions would be greatly appreciated. Thanks!

I tried converting this piece of code to using asyncio but with no noticable improvement


Solution

  • The API docs say you can pass multiple values per call:

    from google.cloud import translate_v2 as gt
    
    def translate(words, to_language="de"):
        client = gt.Client(target_language=to_language)
        result = {}
        for value in client.translate(words):
            original = value["input"]
            trans = value["translatedText"]
            result[original] = trans if trans != "" else original
        return result