Search code examples
pythonauthenticationweb-scrapingpython-requests

Access authenticated page using Python Requests


I'm trying to write a simple scraper to get usage details on my internet account - I've successfully written it using Powershell, but I'd like to move it to Python for ease of use/deployment. If I print r.text (result of POST to login page) I just get the login page form details again.

I think the solution might be something along the lines of using prepare_request? Apologies if I'm missing something super obvious, been about 5 years since I touched python ^^

import requests
USERNAME = 'usernamehere'
PASSWORD = 'passwordhere'
loginURL = 'https://myaccount.amcom.com.au/ClientLogin.aspx'
secureURL = 'https://myaccount.amcom.com.au/FibreUsageDetails.aspx'

session = requests.session()
req_headers = {'Content-Type': 'application/x-www-form-urlencoded'}

formdata = {
    'ctl00$MemberToolsContent$txtUsername': USERNAME,
    'ctl00$MemberToolsContent$txtPassword': PASSWORD,
    'ctl00$MemberToolsContent$btnLogin' : 'Login'
}

session.get(loginURL)
r = session.post(loginURL, data=formdata, headers=req_headers, allow_redirects=False)
r2 = session.get(secureURL)

I've referenced these threads in my attempts:

HTTP POST and GET with cookies for authentication in python Authentication and python Requests

Powershell script for reference:

$r=Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/ClientLogin.aspx' -UseDefaultCredentials -SessionVariable RequestForm
$r.Forms[0].Fields['ctl00$MemberToolsContent$txtUsername'] = "usernamehere"
$r.Forms[0].Fields['ctl00$MemberToolsContent$txtPassword'] = "passwordhere"
$r.Forms[0].Fields['ctl00$MemberToolsContent$btnLogin'] = "Login"

$response = Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/ClientLogin.aspx' -WebSession $RequestForm -Method POST -Body $r.Forms[0].Fields -ContentType 'application/x-www-form-urlencoded'
$response2 = Invoke-WebRequest -Uri 'https://myaccount.amcom.com.au/FibreUsageDetails.aspx' -WebSession $RequestForm

Solution

  • import requests
    import re
    from bs4 import BeautifulSoup
    
    user="xyzmohsin"
    passwd="abcpassword"
    
    s=requests.Session()
    headers={"User-Agent":"Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"}
    s.headers.update(headers)
    
    login_url="https://myaccount.amcom.com.au/ClientLogin.aspx"
    r=s.get(login_url)
    soup=BeautifulSoup(r.content)
    RadMasterScriptManager_TSM=soup.find(src=re.compile("RadMasterScriptManager_TSM"))['src'].split("=")[-1]
    EVENTTARGET=soup.find(id="__EVENTTARGET")['value']
    EVENTARGUMENT=soup.find(id="__EVENTARGUMENT")['value']
    VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
    VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
    
    
    data={"RadMasterScriptManager_TSM":RadMasterScriptManager_TSM,
    "__EVENTTARGET":EVENTTARGET,
    "__EVENTARGUMENT":EVENTARGUMENT,
    "__VIEWSTATE":VIEWSTATE,
    "__VIEWSTATEGENERATOR":VIEWSTATEGENERATOR,
    "ctl00_TopMenu_RadMenu_TopNav_ClientState":"",
    "ctl00%24MemberToolsContent%24HiddenField_Redirect":"",
    "ctl00%24MemberToolsContent%24txtUsername":user,
    "ctl00%24MemberToolsContent%24txtPassword":passwd,
    "ctl00%24MemberToolsContent%24btnLogin":"Login"}
    
    headers={"Content-Type":"application/x-www-form-urlencoded",
    "Host":"myaccount.amcom.com.au",
    "Origin":"https://myaccount.amcom.com.au",
    "Referer":"https://myaccount.amcom.com.au/ClientLogin.aspx"}
    
    r=s.post(login_url,data=data,headers=headers)
    

    I don't have the username and password hence the couldn't test the headers in the final post requests. If it doesn't work - then please remove Host, Origin and Referer from the final post requests's headers.

    Hope that helps :-)