Search code examples
authenticationweb-scrapingpython-requestssession-cookies

how to figure out how to authenticate myself using http requests


I am trying to log in to a site using requests as follows:

s = requests.Session()
login_data = {"userName":"username", "password":"pass", "loginPath":"/d2l/login"}
resp = requests.post("https://d2l.pima.edu/d2l/login?login=1", login_data)

although I am getting a 200 response, when I say

print(resp.content)
b"<!DOCTYPE html><html><head><meta charset='utf-8' /><script>var hash = window.location.hash;if( hash ) hash = '%23' + hash.substring( 1 );window.location.replace('/d2l/login?sessionExpired=0&target=%2fd2l%2ferror%2f404%2flog%3ftargetUrl%3dhttp%253A%252F%252Fd2l.pima.edu%253A80%252Fd2l%252Flogin%253Flogin%253D1' + hash );</script><title></title></head><body></body></html>" 

notice it says session expired. What I've tried: logging back out and in in the actual browser, no success. http basic auth, no success.

I'm thinking maybe I need to authenticate myself to this site using cookies?

If so how do I determine which cookies to send it?

I tried figuring this out by saying

resp.cookies
Out[4]: <RequestsCookieJar[]> 

shouldn't this be giving me names of cookies? I'm not sure what to do with such output. Main Point: HOW DO I FIGURE OUT HOW TO AUTHENTICATE MYSLEF TO THIS WEBSITE? Help is appreciated. I would rather not use selenium.


Solution

  • From loading this page https://d2l.pima.edu/d2l/login and viewing its source, you'll notice the POST target path is /d2l/lp/auth/login/login.d2l. Try using that as your POST path. Your other fields look consistent with the form's expectations.

    Note: with python requests if you create a session object use it to make your requests:

    resp = s.post(<blah blah>, login_data)
    

    The session will hold any cookies set by the login server, and you can continue to use the s object to make requests in the authenticated session.