Search code examples
pythonweb-scrapingbeautifulsouptwo-factor-authentication

Using BeautifulSoup for pages behind 2 factor auth


I am scraping some data for a company project, but all of it is behind the 2-factor in place by my company. The 2 factor authentication requires me to enter a code from my phone/hardware token that lasts 6 seconds. This 2-factor cannot be disabled for a variety of reasons.

Is there any way I can scrape this information? If I run it right now, BS just returns the login page (where I have to enter username/pwd before being taken to the 2 factor page).

If needed, I can also manually enter the 2-factor info (although this would have to repeated every 12 hours, so this method is not preferred). However, I have not even been able to find success with this as BeautifulSoup does not read from pre-logged in browsers and the 2 factor auth code changes every 6 seconds or so and with every login (need to go to multiple different pages, so this is basically as viable as just saving each page as html manually).


Solution

  • As commenters have noted, this depends on how your site sets and checks the login status. In addition to the method in the answer you linked, you should try the following options:

    # using a session, and the cookies argument
    s = requests.Session()
    r = s.get('https://someurl', cookies={'somecookie': 'somecookievalue'})
    
    # using a session, and http headers
    s = requests.Session()
    r = s.get('https://someurl', headers={'somekey': 'somevalue'})
    

    In both of the above cases, the cookie is a key value pair expressed as a python dictionary. Multiple cookies can be passed as multiple key/value pairs. In some cases, auth credentials must be passed directly, like this:

    s = requests.Session()
    s.auth = ('user', 'pass')
    s.get('https://someurl')
    

    Lastly, some combination of two or more of these may be required. Without your code or more info about the website, it's difficult to say more. I hope all this helps.