I want to scrape some information behind a login page but I get a 503
This is what happens when I try to login with mechanicalsoup (same result with robobrowser):
>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser(user_agent='Mozilla/5.0')
>>> page = browser.get('https://X.com')
>>> page.status_code
200
>>> page = browser.get('https://X.com/wp-login.php')
>>> page.status_code
503
I've tried a couple of different user_agents, how can I get around this? moving cookies around?
OK I managed to do this using https://github.com/Anorov/cloudflare-scrape
import cfscrape
from bs4 import BeautifulSoup
# log in
scraper = cfscrape.CloudflareScraper()
scraper.get('https://X.com/wp-login.php')
tokens = cfscrape.get_tokens('https://X.com')
browser = mechanicalsoup.StatefulBrowser(session=scraper, user_agent=tokens[1])
browser.select_form('#loginform')
browser['log'] = 'X'
browser['pwd'] = 'X'
browser.submit_selected()
browser.open('https://X.com/page/')