python google-cloud-platform etl google-translate

google translate API bottleneck

I'm currently working on an ETL pipeline and it takes too long to run after checking which part of the code takes the longest I found this: I'm using the Google Cloud Translate API to translate keywords that don't have translations in my db, but I'm running into a bottleneck when I try to translate a large number of keywords. Here's the code I'm using:

from google.cloud import translate_v2 as gt

gt_client = gt.Client(target_language="de")
for keywd in no_translations:
    keywd_translated[keywd] = gt_client.translate(keywd)["translatedText"]
    if keywd_translated[keywd] == "":
        keywd_translated[keywd] = keywd

The problem is that this code is taking a long time to execute when there are a lot of keywords to translate (10min out of 13 min is consumed by this part). Is there a way to optimize this code or the use of the API to make it faster? Any suggestions would be greatly appreciated. Thanks!

I tried converting this piece of code to using asyncio but with no noticable improvement

Solution

The API docs say you can pass multiple values per call:

from google.cloud import translate_v2 as gt

def translate(words, to_language="de"):
    client = gt.Client(target_language=to_language)
    result = {}
    for value in client.translate(words):
        original = value["input"]
        trans = value["translatedText"]
        result[original] = trans if trans != "" else original
    return result