Search code examples
web-scrapingexponential-backoffretrying

How do I insert Backoff script into my web scrape


I want to use the package "Backoff" in my web scrape and I cannot get it to work. Where do I insert it?How to I get "r = requests..." to still be recognized?

I've tried putting the statement into my code in various ways and it is not working. I want to be able to use this for the packages intended purposes. Thanks!

code to insert

@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_time=60)

def get_url(what goes here?):
    return requests.get(what goes here?)

EXISTING CODE:

import os
import requests
import re
import backoff

asin_list = ['B079QHML21']
urls = []
print('Scrape Started')
for asin in asin_list:
  product_url = f'https://www.amazon.com/dp/{asin}'
  urls.append(product_url)
  base_search_url = 'https://www.amazon.com'
  scraper_url = 'http://api.scraperapi.com'

  while len(urls) > 0:
    url = urls.pop(0)
    payload = {key, url}  #--specific parameters
    r = requests.get(scraper_url, params=payload)
    print("we got a {} response code from {}".format(r.status_code, url))
    soup = BeautifulSoup(r.text, 'lxml')

    #Scraping Below#

I expect to have the "Backoff" code work as its designed in the code to retry 500 errors and not have failures


Solution

  • Instead of calling directly to:

    requests.get(scraper_url, params=payload)
    

    Change get_url to do exactly that, and call get_url:

    @backoff.on_exception(backoff.expo,
                          requests.exceptions.RequestException,
                          max_time=60)
    
    def get_url(scraper_url, payload):
        return requests.get(scraper_url, params=payload)
    

    and in your code instead of:

    r = requests.get(scraper_url, params=payload)
    

    do:

    r = get_url(scraper_url, payload)