Search code examples
pythonurllibpac

Using urllib.request returns Proxy Auto-Config file


I am using Martin Konecny's code from here to query an http site, from behind my corporate firewall:

The code is this:

    import urllib.request
req = urllib.request.Request(
    'http://www.espncricinfo.com/', 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close

However, once I run this code, I receive the PAC file and not the contents of the url.

How do I get past it to download the contents of the website as given the url?

Thank you!


Solution

  • import urllib.request
    
    req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
        }
    )
    
    proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
    opener = urllib.request.build_opener(proxy_support)
    # make opener object the global default opener. 
    urllib.request.install_opener(opener)
    
    
    f = urllib.request.urlopen(req)
    
    g = open('writing.txt','w')
    g.write(f.read().decode('utf-8'))
    g.close