Search code examples
rubymechanize

How do i resolve an HTTP500 Error while web scraping with Mechanize in ruby?


I want to retrieve my driving license number, issue_date, and expiry_date from this website("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp"). When I try to fetch it, I get the error Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.

This is the code that I wrote to scrape:

require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'

OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")  

page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp')  # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click         # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT"  #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last  #submitting my form

Solution

  • It isn't working since you are clearing off the cookies before submitting the form, hence removing all the input data you provided. I could get it working by removing it simply as:

    ...
    
    page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field
    
    form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
    gg = agent.submit form, form.buttons.first
    

    Note that you do not need to set the value for #submit button, rather pass the submit button while form submission itself.