Search code examples
pythonpython-3.xurllib

how can set headers (user-agent), retrieve a web page, capture redirects and accept cookies?


import urllib.request
url="http://espn.com"
f = urllib.request.urlopen(url)
contents = f.read().decode('latin-1')
q = f.geturl()
print(q)

This code will return http://espn.go.com/, which is what I want -- a redirect web site URL. After looking at the Python documentation, googling, etc., I can't figure out how to also:

  1. Capture redirected web site URL (already working)
  2. Change the user-agent on the out going request
  3. Accept any cookies the web page might want to send back

How can I do this in Python 3? If there is a better module than urllib, I am OK with this.


Solution

  • There is a better module, it's called requests:

    import requests
    
    session = requests.Session()
    session.headers['User-Agent'] = 'My-requests-agent/0.1'
    
    resp = session.get(url)
    contents = resp.text  # If the server said it's latin 1, this'll be unicode (ready decoded)
    print(resp.url)       # final URL, after redirects.
    

    requests follows redirects (check resp.history to see what redirects it followed). By using a session (optional), cookies are stored and passed on to subsequent requests. You can set headers per request or per session (so the same extra headers will be sent with every request sent out for that session).