Search code examples
ruby-on-railsrubyuser-interfaceshoes

How do I get content from a website using Ruby / Rails?


I want to copy some specific content from a website using ruby/rails. The content I need is inside a marquee html tag, divided by divs. How can I get access to this content using ruby? To be more precise - I want to use some kind of ruby gui (Preferably shoes). How do I do it?


Solution

  • This isn't really a Rails question. It's something you'd do using Ruby, then possibly display using Rails, or Sinatra or Padrino - pick your poison.

    There are several different HTTP clients you can use:

    Open-URI comes with Ruby and is the easiest. Net::HTTP comes with Ruby and is the standard toolbox, but it's lower-level so you'd have to do more work. HTTPClient and Typhoeus+Hydra are capable of threading and have both high-level and low-level interfaces.

    I recommend using Nokogiri to parse the returned HTML. It's very full-featured and robust.

    require 'nokogiri'
    require 'open-uri'
    
    doc = Nokogiri::HTML(open('http://www.example.com'))
    
    puts doc.to_html
    

    If you need to navigate through login screens or fill in forms before you get to the page you need to parse, then I'd recommend looking at Mechanize. It relies on Nokogiri internally so you can ask it for a Nokogiri document and parse away once Mechanize retrieves the desired URL.

    If you need to deal with Dynamic HTML, then look into the various WATIR tools. They drive various web browsers then let you access the content as seen by the browser.

    Once you have the content or data you want, you can "repurpose" it into text inside a Rails page.