Search code examples
rubyweb-scrapingnokogirimechanizetyphoeus

Submitting login fields during a scraping process with ruby?


I need to scrape some financial data from a system called NetTeller.

An example can be found here.

Note the initial ID field prompt:

ID field

Then once you submit you have to then enter your password: Password field

As you can see, it has a two step process where you first enter an ID number and then after submission the user is presented with a password field. I'm hitting some roadbumps here when it comes to jumping through these two hoops prior to getting on into the system and getting to the data that I actually want. How would one process a scenario such as this where you need to pass through the authentication fields prior first before getting to the data you want to scrape?

I have assumed that I could just jump in with httpclient and nokogiri, but am curious if there are any tricks when dealing with a two-page login such as this before getting into your target.


Solution

  • I would use Mechanize. The first page is "tricky" because the login form is within an iframe. So you could use just the source where the iframe is being loaded. Here is how:

    agent = Mechanize.new
    
    # Get first page
    iframe_url = 'https://www.banksafe.com/sfonline/'
    page = agent.get(iframe_url)
    login_form = page.forms.first
    username_field = login_form.field_with(:name => "12345678")
    
    # Get second page
    response = login_form.submit
    second_login_form = response.forms.first
    password_field = second_login_form.field_with(:password => "xxxxx")
    
    # Get page to scrap
    response = second_login_form.submit
    

    This is how you could process an scenario like this. Obviously you might need to adapt to exactly how those forms/fields are written and other specific-page details, but I would go for this approach.