Search code examples
pythonproxyweb-scrapingurllibhttp-proxy

Scraping web-page data with urllib with headers and proxy


I have got web-page data, but now I want to get it with proxy. How could I do it?

import urllib

def get_main_html():
   request = urllib.request.Request(URL, headers=headers)
   doc = lh.parse(urllib.request.urlopen(request))
   return doc

Solution

  • From the documentation

    urllib will auto-detect your proxy settings and use those. This is through the ProxyHandler, which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful. One way to do this is to setup our own ProxyHandler, with no proxies defined. This is done using similar steps to setting up a Basic Authentication handle.

    Check this, https://docs.python.org/3/howto/urllib2.html#proxies