Search code examples
pythonpython-3.xdebuggingurllib

python 3 urllib and http.client - unable to turn on debug messages


Hi Stackoverflow community,

I'm trying to get familiar with the urllib.request standard library and use it in my scripts at work instead of wget. I'm however unable to get the detailed HTTP messages displayed neither in IDLE nor using script file or manually typing the commandy into cmd (py).

I'm using Python on Windows 7 x64, and tried 3.5 and 3.6 including 3.6.1rc1 without success.

The messages are supposedly turned on using this command:

http.client.HTTPConnection.debuglevel = 1

so here is my sample code. It works but no details are displayed:

import http.client
import urllib.request
http.client.HTTPConnection.debuglevel = 1
response = urllib.request.urlopen('http://stackoverflow.com')
content = response.read()
with open("stack.html", "wb") as file:
    file.write(content)

I have tried using .set_debuglevel(1) without success. There seem to be years old questions here Turning on debug output for python 3 urllib However this is the same as I have and it's not working. Also in this question's comment user Yen Chi Hsuan says it's a bug and reported it here https://bugs.python.org/issue26892

The bug was closed in June 2016 so I would expect this is corrected in recent Python versions.

Maybe I'm missing something (e.g. something else needs to be enabled / installed etc..) but I spent some time on this and reached a dead end.

Is there a working way to have the http detailed messages displayed with urllib on Python 3 on Windows?

Thank you

EDIT: the response suggested by pvg works on the simple example but I cannot make it to work in a case where login needed. The HTTPBasicAuthHandler does not have this debuglevel attribute. And when I try combining multiple handlers into the opener it does not work either.

userName = 'mylogin'
passWord  = 'mypassword'
top_level_url = 'http://page-to-login.com'

# create an authorization handler
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, top_level_url, userName, passWord);

auth_handler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)

result = opener.open(top_level_url)
content = result.read()

Solution

  • The example in the issue you linked shows the working code, a version reproduced below:

    import urllib.request
    
    handler = urllib.request.HTTPHandler(debuglevel=10)
    opener = urllib.request.build_opener(handler)
    content = opener.open('http://stackoverflow.com').read()
    
    print(content[0:120])
    

    This is pretty clunky, another option is to use a friendlier library like urllib3 (http://urllib3.readthedocs.io/en/latest/).

    import urllib3
    
    urllib3.add_stderr_logger()
    http = urllib3.PoolManager()
    r = http.request('GET', 'http://stackoverflow.com')
    print(r.status)
    

    If you decide to use the requests library instead, the following answer describes how to set up logging:

    How can I see the entire HTTP request that's being sent by my Python application?