Search code examples
pythondjangopython-requestsbatch-processinggoogle-translation-api

Google Translation API id blocking ip address for too many requests


I'm setting up a Django Views that requests products data from an API, parse them with BeautifulSoup, apply the googletrans module and save the response into my Postgresql database.

Everything was working fine yesterday until suddenly, Google blocked access on my IP address for too many requests at once..

I just turned on my LTE to change my IP address and it worked.

But now, to make sure that it doesn't happen with this IP address again I need to find a way to call the googletrans API in batches or any other solution that would prevent me from getting blocked again.

This is my Views:

from bs4 import BeautifulSoup
from googletrans import Translator
import requests
import json


def api_data(request):
    if request.GET.get('mybtn'):  # to improve, == 'something':
        resp_1 = requests.get(
            "https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000&currencyCode=CAD",
            headers={
                "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
            })
        resp_1_data = resp_1.json()
        base_url_2 = "https://www.headout.com/api/public/v1/product/get/"

        translator = Translator()

        for item in resp_1_data['items']:
            print('translating item {}'.format(item['id']))
            # concat ID to the URL string
            url = '{}{}'.format(base_url_2, item['id'] + '?language=fr')

            # make the HTTP request
            resp_2 = requests.get(
                url,
                headers={
                    "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
                })
            resp_2_data = resp_2.json()

            descriptiontxt = resp_2_data['contentListHtml'][0]['html'][0:2040] + ' ...'

            #Parsing work
            soup = BeautifulSoup(descriptiontxt, 'lxml')
            parsed = soup.find('p').text

            #Translation doesn't work
            translation = translator.translate(parsed, dest='fr')

            titlename = item['name']
            titlefr = translator.translate(titlename, dest='fr')

            destinationname = item['city']['name']
            destinationfr = translator.translate(destinationname, dest='fr')

            Product.objects.get_or_create(
                title=titlefr.text,
                destination=destinationfr.text,
                description=translation.text,
                link=item['canonicalUrl'],
                image=item['image']['url']
            )

    return render(request, "form.html")

How can I call the Google translation API in Batch? Or is there any other solution for that?

Please help.

EDIT

Based on @ddor254 where should I put the: time.sleep(2)?

This is what I came out with, is this okay?

  Product.objects.get_or_create(
      title=titlefr.text,
      destination=destinationfr.text,
      description=translation.text,
      link=item['canonicalUrl'],
      image=item['image']['url']
  )time.sleep(2) #here

or like this:

resp_1 = requests.get(
            "https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000&currencyCode=CAD",
            headers={
                "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
            }, time.sleep(2)) #here

Just want to make sure that its the right way to do it before risking of getting this new IP also blocked.


Solution

  • I suggest you read this article from MDN: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

    if this is the response you get so try and look at the header Retry-After in the response object.

    so adding a sleep or other delay method, with the value of that header might fix your problem.