Search code examples
pythonweb-scrapingmechanizemechanicalsoup

503 error logging in using Python MechanicalSoup


I want to scrape some information behind a login page but I get a 503

This is what happens when I try to login with mechanicalsoup (same result with robobrowser):

>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser(user_agent='Mozilla/5.0')
>>> page = browser.get('https://X.com')
>>> page.status_code
200
>>> page = browser.get('https://X.com/wp-login.php')
>>> page.status_code
503

I've tried a couple of different user_agents, how can I get around this? moving cookies around?


Solution

  • OK I managed to do this using https://github.com/Anorov/cloudflare-scrape

    import cfscrape
    from bs4 import BeautifulSoup
    
    # log in
    scraper = cfscrape.CloudflareScraper()
    scraper.get('https://X.com/wp-login.php')
    tokens = cfscrape.get_tokens('https://X.com')
    browser = mechanicalsoup.StatefulBrowser(session=scraper, user_agent=tokens[1])
    browser.select_form('#loginform')
    browser['log'] = 'X'
    browser['pwd'] = 'X'
    browser.submit_selected()
    browser.open('https://X.com/page/')