Search code examples
pythonurllib

Stop urllib.request from raising exceptions on HTTP errors


Python's urllib.request.urlopen() will raise an exception if the HTTP status code of the request is not OK (e.g., 404).

This is because the default opener uses the HTTPDefaultErrorHandler class:

A class which defines a default handler for HTTP error responses; all responses are turned into HTTPError exceptions.

Even if you build your own opener, it (un)helpfully includes the HTTPDefaultErrorHandler for you implicitly.

If, however, you don't want Python to raise an exception if you get a non-OK response, it's unclear how to disable this behavior.


Solution

  • If you build your own opener with build_opener(), the documentation notes, emphasis added,

    Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them: ... HTTPDefaultErrorHandler ...

    Therefore, we need to make our own subclass of HTTPDefaultErrorHandler that does not raise an exception and simply passes the response through the pipeline unmodified. Then build_opener() will use our error handler instead of the default one.

    import urllib.request
    
    class NonRaisingHTTPErrorProcessor(urllib.request.HTTPErrorProcessor):
        http_response = https_response = lambda self, request, response: response
    
    opener = urllib.request.build_opener(NonRaisingHTTPErrorProcessor)
    response = opener.open('http://example.com/doesnt-exist')
    print(response.status)  # prints 404
    

    This answer (including the code sample) was not written by ChatGPT, but it did point out the solution.