Search code examples
pythonweb-scrapingbasic-authentication

Login to website and then web scrape data (Python)


I kindly ask you to correct me what I am wrong here. The html code I obtain is again a page with login form. What I meant to do here is to get a token during one session and then to use it to log in. What I plan to do is to use bs4 to collect some data.

import bs4
import requests

session = requests.session
with requests.Session() as s:
    url = 'https://www.planetplus.pl/'
    res = requests.get(url)
    data = res.text
    soup = bs4.BeautifulSoup(data, 'lxml')
    token = soup.find_all('input', attrs={'name': '__RequestVerificationToken'})[0]['value']
    print(token)
    payload = {'UserName': 'xxx', 'Password': 'yyy',
               '__RequestVerificationToken': token}
    p = s.post(url, data=payload)
    r = s.get('https://www.planetplus.pl/moje-konto-cashback')
    print(r.text)

Website link: https://www.planetplus.pl/

Well, to be honest I am a beginner, so if you could correct me and even elaborate, suggest how to do it the best, this would be great!

Additionally, how much the procedure is different for website http://www.exsite.pl/, because I delete the token part in login credentials dictionary and also the output is access restricted page html code. And do not judge websites, first found with different login type, at least different for me ;)

import requests

with requests.Session() as s:
    session = requests.session
    url = 'http://exsite.pl//'
    payload = {'login_name': 'xxx', 'login_password': 'yyy!'}
    p = s.post(url, data=payload)
    #print(p.text)
    r = s.get('http://www.exsite.pl/filmy_video_movies/filmy-dvdrip-brrip/1378773-ukryte-piekno-collateral-beauty-2016-plsubbed480pbrripxvidac3-krt-napisy-pl.html')
    print(r.text)

Solution

  • Working example for https://www.planetplus.pl/

    import bs4
    import requests
    
    
    BASE_URL = 'https://www.planetplus.pl/'
    LOGIN_URL = BASE_URL + 'logowanie'
    
    
    with requests.Session() as session:
        res = session.get(BASE_URL)
    
        soup = bs4.BeautifulSoup(res.text, 'lxml')
        token = soup.find_all('input', attrs={'name': '__RequestVerificationToken'})[0]['value']
        payload = {'UserName': '6r5anl+fnmps358bvh8@sharklasers.com', 'Password': 'qwerty',
                   '__RequestVerificationToken': token}
    
        session.post(LOGIN_URL , data=payload)
        res = session.get(BASE_URL)
        print(res.text)