python python-3.x google-translation-api

Python - loop through N records at a time and then start again

I'm trying to write a script that calls Google Translation API in order to translate each line from an Excel file that has 1000 lines.

I'm using pandas to load and to read the values from a specific values and then I append the data frame to a list and then I use Google API to translate:

import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime

# Variable for GCP service account credentials

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'

# Path to the file

filepath = r'../file.xlsx'

# Instantiate the Google Translation API Client

translate_client = translate.Client()

# Read all the information from the Excel file within 'test' sheet name

df = pd.read_excel(filepath, sheet_name='test')

# Define an empty list

elements = []

# Loop the data frame and append the list


for i in df.index:
    elements.append(df['EN'][i])

# Loop the list and translate each line
for item in elements:
    output = translate_client.translate(
        elements,
        target_language='fr'
    )


result = [
    element['translatedText'] for element in output
]

print("The values corresponding to key : " + str(result))

After I append to the list the total number of the elements will be 1000. The problem with Google Translation API is that if you are sending multiple segments they call it, it returns the below error:

400 POST https://translation.googleapis.com/language/translate/v2: Too many text segments

I've investigated it and I have seen that sending 100 lines (in my case) would be a solution. Now I am a bit stuck.

How would I have to write the loop to iterate 100 lines at a time, to translate those 100 lines and then do something with the result, and then proceed with the other 100 and so on until it gets to the end?

Solution

Assuming you are able to pass a list into a single translate call, perhaps you could do something like that:

# Define a helper to step thru the list in chunks
def chunker(seq, size):
    return (seq[pos : pos + size] for pos in range(0, len(seq), size))

# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
    temp = translate_client.translate(
        chunk,
        target_language='fr'
    )
    output.extend(temp)