Search code examples
pythonhtmlxmlurllib2http-status-code-301

Python 2.7 urllib2 raising urllib2.HTTPError 301 when hitting redirect with xml content


I'm using urllib2 to request a particular S3 bucket at hxxp://s3.amazonaws.com/mybucket. Amazon sends back an HTTP code of 301 along with some XML data (the redirect being to hxxp://mybucket.s3.amazonaws.com/). Instead of following the redirect, python raises urllib2.HTTPError: HTTP Error 301: Moved Permanently.

According to the official Python docs at HOWTO Fetch Internet Resources Using urllib2, "the default handlers handle redirects (codes in the 300 range)".

Is python handling this incorrectly (presumably because of the unexpected XML in the response), or am I doing something wrong? I've watched in Wireshark and the response comes back exactly the same to python's request as it does to me using a web client. In debugging, I don't see the XML being captured anywhere in the response object.

Thanks for any guidance.

Edit: Sorry for not posting the code initially. It's nothing special, literally just this -

import urllib2, httplib

request = urllib2.Request(site)
response = urllib2.urlopen(request)

Solution

  • You are better off using the requests library. requests handle redirection by default : http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history

    import requests
    
    response = requests.get(site)
    print(response.content)
    

    I don't get the problem with urllib2, I tried to look into the documentation https://docs.python.org/2/library/urllib2.html but it doesn't look intuitive.

    It seems that in Python3, they refactored it to make it less a burden to use, but I am still convinced that requests is the way to go.

    Note The urllib2 module has been split across several modules in Python 3 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to Python 3.