I'm setting up a Django
Views that requests products data from an API, parse them with BeautifulSoup
, apply the googletrans
module and save the response into my Postgresql database.
Everything was working fine yesterday until suddenly, Google blocked access on my IP address for too many requests at once..
I just turned on my LTE to change my IP address and it worked.
But now, to make sure that it doesn't happen with this IP address again I need to find a way to call the googletrans
API in batches or any other solution that would prevent me from getting blocked again.
This is my Views:
from bs4 import BeautifulSoup
from googletrans import Translator
import requests
import json
def api_data(request):
if request.GET.get('mybtn'): # to improve, == 'something':
resp_1 = requests.get(
"https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000¤cyCode=CAD",
headers={
"Headout-Auth": HEADOUT_PRODUCTION_API_KEY
})
resp_1_data = resp_1.json()
base_url_2 = "https://www.headout.com/api/public/v1/product/get/"
translator = Translator()
for item in resp_1_data['items']:
print('translating item {}'.format(item['id']))
# concat ID to the URL string
url = '{}{}'.format(base_url_2, item['id'] + '?language=fr')
# make the HTTP request
resp_2 = requests.get(
url,
headers={
"Headout-Auth": HEADOUT_PRODUCTION_API_KEY
})
resp_2_data = resp_2.json()
descriptiontxt = resp_2_data['contentListHtml'][0]['html'][0:2040] + ' ...'
#Parsing work
soup = BeautifulSoup(descriptiontxt, 'lxml')
parsed = soup.find('p').text
#Translation doesn't work
translation = translator.translate(parsed, dest='fr')
titlename = item['name']
titlefr = translator.translate(titlename, dest='fr')
destinationname = item['city']['name']
destinationfr = translator.translate(destinationname, dest='fr')
Product.objects.get_or_create(
title=titlefr.text,
destination=destinationfr.text,
description=translation.text,
link=item['canonicalUrl'],
image=item['image']['url']
)
return render(request, "form.html")
How can I call the Google translation API in Batch? Or is there any other solution for that?
Please help.
EDIT
Based on @ddor254 where should I put the: time.sleep(2)
?
This is what I came out with, is this okay?
Product.objects.get_or_create(
title=titlefr.text,
destination=destinationfr.text,
description=translation.text,
link=item['canonicalUrl'],
image=item['image']['url']
)time.sleep(2) #here
or like this:
resp_1 = requests.get(
"https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000¤cyCode=CAD",
headers={
"Headout-Auth": HEADOUT_PRODUCTION_API_KEY
}, time.sleep(2)) #here
Just want to make sure that its the right way to do it before risking of getting this new IP also blocked.
I suggest you read this article from MDN: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
if this is the response you get so try and look at the header Retry-After
in the response object.
so adding a sleep or other delay method, with the value of that header might fix your problem.