Search code examples
pythonweb-scrapingscreen-scraping

Web Scraping in Python with a Login Page


I'm using this code to try and do some web scraping. I'm trying to access my school grades using requests and beautiful soup and I'm having a lot of trouble logging in. I just get the error:

TypeError: 'NoneType' object has no attribute '__getitem__'

Here's the code that I'm using:

import requests
from bs4 import BeautifulSoup

    headers = {
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}

login_data = {
    'name': 'my_username',
    'pass': 'my_password',
    'form_id': 'new_login_form',
    'op': 'Login'
}

with requests.Session() as s:
    url = 'https://irc.d125.org'
    r = s.get(url, headers=headers)
    soup = BeautifulSoup(r.content, 'html5lib')
    login_data['form_build_id'] = soup.find('input', attrs={'name': 'form_build_id'})['value']
    r = s.post(url, data=login_data, headers=headers)
    print(r.content)

Any help is appreciated! Thanks so much!


Solution

  • When the login button is pressed, the site sends an xhr request with the login information. The following should work, just replace your username and password in the space provided.

    Code

    import requests
    from bs4 import BeautifulSoup
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
    }
    
    login_data = {
        "UserName": "REPLACE_USER",  # Enter Username
        "Password": "REPLACE_PASSWORD",  # Enter password
        "RememberMe": False,
    }
    
    with requests.Session() as s:
        url = 'https://irc.d125.org/Login'
        s.get(url, headers=headers)
        r = s.post(url, data=login_data)
        print(r.text)