Search code examples
pythonpython-2.7python-3.xpython-requestsurllib2

How can I recreate a urllib.requests in Python 2.7?


I'm crawling some web-pages and parsing through some data on them, but one of the sites seems to be blocking my requests. The version of the code using Python 3 with urllib.requests works fine. My problem is that I need to use Python 2.7, and I can't get a response using urllib2

Shouldn't these requests be identical?

Python 3 version:

def fetch_title(url):
    req = urllib.request.Request(
        url, 
        data=None, 
        headers={
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
        }
    )
    html = urllib.request.urlopen(req).read().encode('unicode-escape').decode('ascii')

    return html

Python 2.7 version:

import urllib2

opener = urllib2.build_opener()
opener.addheaders = [(
            'User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
        )]
response = opener.open('http://website.com')

print response.read()

Solution

  • The following code should work, essentially with python 2.7 you can create a dictionary with your desired headers and format your request in a way that it will work properly with urllib2.urlopen using urllib2.Request.

    import urllib2
    
    def fetch_title(url):
        my_headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"
        }
        return urllib2.urlopen(urllib2.Request(url, headers=my_headers)).read()