Search code examples
rubyscreen-scrapingnokogiri

Image scraping in Ruby


How do I scrape an image present on a particular URL using Nokogiri? If there are better options than Nokogiri please suggest. The css image tag is .profilePic img


Solution

  • If it is just an <img> with a URL:

    PAGE = "http://site.com/page.html"
    require 'nokogiri'
    require 'open-uri'
    html = Nokogiri.HTML(open(PAGE))
    src  = html.at('.profilePic img')['src']
    File.open("foo.png", "wb") do |f|
      f.write(open(src).read)
    end
    

    If you need to turn a relative image path into an absolute, see:
    https://stackoverflow.com/a/4864170/405017