Search code examples
pythonweb-scrapingurllib2basic-authenticationhttp-status-code-405

python 405 error authentication


I'm trying to write a web scraper to automate some of the things I have to do at work. To use the web application on the site, I need to log in with a basic authentication (I know the scheme is basic). In a web browser, I go to the URL, an error message pops up asking for my username and password, which I give, and then I'm allowed to a second login page (just in case that's relevant).

Here's what I do in Python, using urllib2:

theurl = 'http://where-i-work.com/the-backend'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:42.0) Gecko/20100101 Firefox/42.0' 
#nothing seems to work if the server knows I'm using Python

headers = { 'User-Agent' : user_agent,
'Authorization' : 'Basic myauthorizationstring12345'}
#I've seen this sent in my requests, but also double checked from encoding username and pw with base64

data = ''
req = urllib2.Request(theurl, data, headers)
handle = urllib2.urlopen(req)

When I try to make handle, this yields a 405 error: Method Not Allowed. I've read that 99% of the time, that means I tried to send POST values when I shouldn't have. But I've also seen this information sent in requests when I use Tamper Data. Just to see, I tried sending the info in the headers in data instead (url encoded) and I got a 401 error, like I never sent the login credentials.

Again, just to try, I also tried https instead of http, which yielded a "Certificate Verified Failed" error, which I understand to be a separate issue. Basically, I've been trying what I can think of.

I've also tried using urllib2.HTTPPasswordMgrWithDefaultRealm() and urllib2.HTTPBasicAuthHandler(), etc. and I still got the 405 error. Right now I'm going with what I put up there because I want to see everything that is happening while I'm still trying to figure this out.

Do I just not understand how the credentials are being sent normally by a browser? Am I doing something different?


Solution

  • You're providing a non-null value for the data parameter, so your request is being sent as a POST. Just use keyword arguments and don't specify a value for data:

    req = urllib2.Request(theurl, headers=headers)
    

    An alternative approach would be to use the requests module:

    import requests
    
    response = requests.get(
        url='http://where-i-work.com/the-backend',
        headers={
            'User-Agent': 'Mozilla'
        },
        auth=('username', 'password')
    )