Search code examples
pythonpython-3.xurlbeautifulsouprequest

urllib.request.urlopen(url) with Authentication


I've been playing with beautiful soup and parsing web pages for a few days. I have been using a line of code which has been my saviour in all the scripts that I write. The line of code is :

r = requests.get('some_url', auth=('my_username', 'my_password')).

BUT ...

I want to do the same thing with (OPEN A URL WITH AUTHENTICATION):

(1) sauce = urllib.request.urlopen(url).read() (1)
(2) soup = bs.BeautifulSoup(sauce,"html.parser") (2)

I'm not able to open a url and read, the webpage which needs authentication. How do I achieve something like this :

  (3) sauce = urllib.request.urlopen(url, auth=(username, password)).read() (3) 
instead of (1)

Solution

  • Have a look at the HOWTO Fetch Internet Resources Using The urllib Package from the official docs:

    # create a password manager
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    
    # Add the username and password.
    # If we knew the realm, we could use it instead of None.
    top_level_url = "http://example.com/foo/"
    password_mgr.add_password(None, top_level_url, username, password)
    
    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
    
    # create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(handler)
    
    # use the opener to fetch a URL
    opener.open(a_url)
    
    # Install the opener.
    # Now all calls to urllib.request.urlopen use our opener.
    urllib.request.install_opener(opener)