Search code examples
rubyerror-handlingtimeoutopen-uri

Using Open-URI to fetch XML and the best practice in case of problems with a remote url not returning/timing out?


Current code works as long as there is no remote error:

def get_name_from_remote_url
      cstr = "http://someurl.com"
      getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
      doc = Nokogiri::XML(getresult)
      my_data = doc.xpath("/session/name").text
      #  => 'Fred' or 'Sam' etc
      return my_data
end

But, what if the remote URL times out or returns nothing? How I detect that and return nil, for example?

And, does Open-URI give a way to define how long to wait before giving up? This method is called while a user is waiting for a response, so how do we set a max timeoput time before we give up and tell the user "sorry the remote server we tried to access is not available right now"?


Solution

  • Open-URI is convenient, but that ease of use means they're removing the access to a lot of the configuration details the other HTTP clients like Net::HTTP allow.

    It depends on what version of Ruby you're using. For 1.8.7 you can use the Timeout module. From the docs:

    require 'timeout'
    begin
    status = Timeout::timeout(5) {
      getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
    }
    rescue Timeout::Error => e
      puts e.to_s
    end
    

    Then check the length of getresult to see if you got any content:

    if (getresult.empty?)
      puts "got nothing from url"
    end
    

    If you are using Ruby 1.9.2 you can add a :read_timeout => 10 option to the open() method.


    Also, your code could be tightened up and made a bit more flexible. This will let you pass in a URL or default to the currently used URL. Also read Nokogiri's NodeSet docs to understand the difference between xpath, /, css and at, %, at_css, at_xpath:

    def get_name_from_remote_url(cstr = 'http://someurl.com')
      doc = Nokogiri::XML(open(cstr, 'UserAgent' => 'Ruby-OpenURI'))
    
      # xpath returns a nodeset which has to be iterated over
      # my_data = doc.xpath('/session/name').text #  => 'Fred' or 'Sam' etc  
    
      # at returns a single node
      doc.at('/session/name').text
    end