I'm trying to write a script that calls Google Translation API
in order to translate each line from an Excel file that has 1000 lines.
I'm using pandas
to load and to read the values from a specific values and then I append the data frame to a list and then I use Google API
to translate:
import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime
# Variable for GCP service account credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'
# Path to the file
filepath = r'../file.xlsx'
# Instantiate the Google Translation API Client
translate_client = translate.Client()
# Read all the information from the Excel file within 'test' sheet name
df = pd.read_excel(filepath, sheet_name='test')
# Define an empty list
elements = []
# Loop the data frame and append the list
for i in df.index:
elements.append(df['EN'][i])
# Loop the list and translate each line
for item in elements:
output = translate_client.translate(
elements,
target_language='fr'
)
result = [
element['translatedText'] for element in output
]
print("The values corresponding to key : " + str(result))
After I append to the list the total number of the elements will be 1000. The problem with Google Translation API
is that if you are sending multiple segments they call it, it returns the below error:
400 POST https://translation.googleapis.com/language/translate/v2: Too many text segments
I've investigated it and I have seen that sending 100 lines (in my case) would be a solution. Now I am a bit stuck.
How would I have to write the loop to iterate 100 lines at a time, to translate those 100 lines and then do something with the result, and then proceed with the other 100 and so on until it gets to the end?
Assuming you are able to pass a list into a single translate call, perhaps you could do something like that:
# Define a helper to step thru the list in chunks
def chunker(seq, size):
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
temp = translate_client.translate(
chunk,
target_language='fr'
)
output.extend(temp)