I have issues scraping certain websites, while others work. For example, this works:
page = requests.get('https://wsj.com/', proxies=proxydict)
But this doesn't:
page = requests.get('https://www.privateequityinternational.com/', proxies=proxydict)
I get a "max retries" error, even though I only scrape 1 page (and haven't scraped it before).
I've tried using a header for the websites that won't scrape but it hasn't worked. Is there a specific header I should use? How do I scrape that second website I've shown above (www.privateequityinternational.com)? Thank you.
The issue is the page is served over http in your browser not https, you get a warning from google when you try to access the page using https:
In [1]: import requests
...: page = requests.get('http://www.wsj.com')
...:
In [2]: page
Out[2]: <Response [200]>