Search code examples
pythonweb-scrapinghttp-headerspython-requestsscreen-scraping

Web-Scraping Max Retries Rejected


I have issues scraping certain websites, while others work. For example, this works:

page = requests.get('https://wsj.com/', proxies=proxydict)

But this doesn't:

page = requests.get('https://www.privateequityinternational.com/', proxies=proxydict)

I get a "max retries" error, even though I only scrape 1 page (and haven't scraped it before).

I've tried using a header for the websites that won't scrape but it hasn't worked. Is there a specific header I should use? How do I scrape that second website I've shown above (www.privateequityinternational.com)? Thank you.


Solution

  • The issue is the page is served over http in your browser not https, you get a warning from google when you try to access the page using https:

    In [1]: import requests
       ...: page = requests.get('http://www.wsj.com')
       ...: 
    
    In [2]: page
    Out[2]: <Response [200]>