Search code examples
ruby-on-railsrubyscreen-scrapingrendermechanize

Ruby gem Mechanize


Is it possible to use the render method of a controller to render the content of a Mechanize object? I tried:

def new
  a = Mechanize.new
  a.get('http://flickr.com/')

  render :html => a.current_page
end

which throws an error, as well as render :text => a, a.page, and a.current_page.

I understand that the render function is not expecting a Mechanize object, I just don't know what it wants and how to get it there.

I am at the beginning stages of my development and researching all web scraping frameworks for Ruby and any help would be appreciated.


Solution

  • Try the body method:

    page = agent.get('http://www.example.net')
    puts page.body[0..100]
    => "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml"
    

    You can also dive deeper into the document using Nokogiri's capabilities. Mechanize is built around Nokogiri, so you can get to the parsed document Nokogiri creates, then use CSS or XPath accessors to located sub-sections of the document. Once you've found what you want you can use the to_html method to have Nokogiri emit the HTML for the nodes or nodeset. See "extract single string from html using ruby/mechanize (and nokogiri)" for information.

    Now, while that'll work, you might want to consider whether you're violating the terms-of-service or copyrights by reusing the content directly on your page.