I have a pandas dataframe of some 200k records. It has two columns; the text in English and a score. I want to translate a column from English to a few other languages. For that, I'm using the Cloud Translation API from Google's GCP. It's however, taking an absurdly long time to translate them. My code is basically this:
def translate_text(text, target_language):
from google.cloud import translate_v2 as translate
try:
translate_client = translate.Client(credentials=credentials)
result = translate_client.translate(text, target_language=target_language)
return result['translatedText']
except Exception as e:
print(e)
and this:
df['X_language'] = df['text'].apply(lambda text: translate_text(text, '<LANG CODE>'))
I've seen that apply()
is fairly slow, plus the response from the API might be another factor in it being slow, but is there any way to make it more efficient? I tried swifter but that barely shaved off a couple of seconds (when testing against a subset of the dataframe).
Note that some of the text fields in the dataframe have around 300 characters in them. Not many but a decent number.
EDIT:
After importing translate
from google.cloud
and defining the client once outside the function, the code ran much quicker. However, for some reason when I try to pass a list (the rows of the 'text' column), it doesn't return the translated text; it just runs quickly and returns the list itself in English.
Might that have to do with the credentials I'm using, or? I'm passing the service account JSON file you get when you create a project in GCP.
EDIT 2:
I partitioned my dataframe into 4, each with ~50k records. It still takes too much time. I even removed all text with more than 250 characters..
I think it's an translation API issue? It takes way too long to translate I guess.
To fix the slow code, I just initialized the import and translate client outside the function once.
In the case of the 403 POST error, I had to create another GCP account. When I saw the quotas in the old account (trial), nothing was exceeded or close to, but the trial period apparently ended and I didn't have the free credits ($400) anymore. I tried enabling billing for the API (and checked my card wasn't defunct) but that didn't change much. Translate by batch worked in my newer account.
So, it was just an account issue rather than an API issue.