Search code examples
http-headersurllibpython-3.6

HTTP Basic Authentication not working with Python 3


I am trying to access an intranet site with HTTP Basic Authentication enabled.

Here's the code I'm using:

from bs4 import BeautifulSoup
import urllib.request, base64, urllib.error

request = urllib.request.Request(url)
string = '%s:%s' % ('username','password')

base64string = base64.standard_b64encode(string.encode('utf-8'))

request.add_header("Authorization", "Basic %s" % base64string)
try:
    u = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print(e)
    print(e.headers)

soup = BeautifulSoup(u.read(), 'html.parser')

print(soup.prettify())

But it doesn't work and fails with 401 Authorization required. I can't figure out why it's not working.


Solution

  • The solution given here works without any modifications.

    from bs4 import BeautifulSoup
    import urllib.request
    
    # create a password manager
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    
    # Add the username and password.
    # If we knew the realm, we could use it instead of None.
    top_level_url = "http://example.com/foo/"
    password_mgr.add_password(None, top_level_url, username, password)
    
    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
    
    # create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(handler)
    
    # use the opener to fetch a URL
    u = opener.open(url)
    
    soup = BeautifulSoup(u.read(), 'html.parser')
    

    The previous code works as well. You just have to decode the utf-8 encoded string otherwise the header contains a byte-sequence.

    from bs4 import BeautifulSoup
    import urllib.request, base64, urllib.error
    
    request = urllib.request.Request(url)
    string = '%s:%s' % ('username','password')
    
    base64string = base64.standard_b64encode(string.encode('utf-8'))
    
    request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
    try:
        u = urllib.request.urlopen(request)
    except urllib.error.HTTPError as e:
        print(e)
        print(e.headers)
    
    soup = BeautifulSoup(u.read(), 'html.parser')
    
    print(soup.prettify())