Search code examples
pythoncookiesbeautifulsoupmechanize

Loging in using Mechanize


I am trying to extract some data from a website - not a lot - but enough to warrant a little script... I am attempting to first log in to the site https://squashlevels.com using mechanize and cookielib, but I am failing...

I currently have

from bs4 import BeautifulSoup
import requests
import re
import urllib2 
import cookielib
import mechanize

cj = cookielib.CookieJar()
br = mechanize.Browser()

br.set_cookiejar(cj)
br.open("https://squashlevels.com/menu_login.php")

# How do I log in?

r = requests.get('https://squashlevels.com/players.php?all&club=1314')
soup = BeautifulSoup(r.content, "html.parser")

## Do stuff...

What code should I be using to log into this site?

Thanks for your time.


Solution

  • Here's a solution using just requests, I'm not really sure mechanize would provide any additional value. By utilizing requests.Session, you maintain the cookies returned by the login process to include when requesting players.php.

    The minor wrench that the site throws into the mix is the fact that you also need to post the MD5 hash of the password:

    email = '[email protected]'
    password = 'secret'
    
    s = requests.Session()
    s.post('https://squashlevels.com/menu_login.php', data={
        'action': 'login',
        'email': email,
        'password': password,
        'md5password': hashlib.md5(password.encode('utf-8')).hexdigest()
    })
    
    r = s.get('https://squashlevels.com/players.php?all&club=1314')
    soup = BeautifulSoup(r.content, 'html.parser')
    
    for row in soup.select('table.ranking tr'):
        print([col.text.strip() for col in row.select('td')])
    

    Output:

    ['1', 'Nathan Miller', 'Bluecoat Sports Horsham', 'East England Masters 2018/19', '6', '15 Dec 2018', '4,706', '70%', '']
    ['2', 'Kit Pearman', 'Dorking', 'Surrey Winter League 2018/19', '2', '20 Nov 2018', '4,469', '64%', '']
    ['3', 'Simon Millard', 'Bluecoat Sports Horsham', 'Sussex Mens League 2018/2019', '1', '04 Dec 2018', '2,680', '57%', '']
    ...