Search code examples
pythonhttpproxyurllib2

Using an HTTP PROXY - Python


I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.

Generally urllib works fine, the problem is dealing with urllib2.

>>> urllib2.urlopen("http://www.google.com").read()

returns

urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

or

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

Extra info:

urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...

I tried @Fenikso answer but I'm getting this error now:

URLError: <urlopen error [Errno 10060] A connection attempt failed because the 
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>      

Any ideas?


Solution

  • You can do it even without the HTTP_PROXY environment variable. Try this sample:

    import urllib2
    
    proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    
    html = urllib2.urlopen("http://www.google.com").read()
    print html
    

    In your case it really seems that the proxy server is refusing the connection.


    Something more to try:

    import urllib2
    
    #proxy = "61.233.25.166:80"
    proxy = "YOUR_PROXY_GOES_HERE"
    
    proxies = {"http":"http://%s" % proxy}
    url = "http://www.google.com/search?q=test"
    headers={'User-agent' : 'Mozilla/5.0'}
    
    proxy_support = urllib2.ProxyHandler(proxies)
    opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
    urllib2.install_opener(opener)
    
    req = urllib2.Request(url, None, headers)
    html = urllib2.urlopen(req).read()
    print html
    

    Edit 2014: This seems to be a popular question / answer. However today I would use third party requests module instead.

    For one request just do:

    import requests
    
    r = requests.get("http://www.google.com", 
                     proxies={"http": "http://61.233.25.166:80"})
    print(r.text)
    

    For multiple requests use Session object so you do not have to add proxies parameter in all your requests:

    import requests
    
    s = requests.Session()
    s.proxies = {"http": "http://61.233.25.166:80"}
    
    r = s.get("http://www.google.com")
    print(r.text)