Search code examples
rubyscreen-scrapingweb-crawlernokogiri

Save all image files from a website


I'm creating a small app for myself where I run a Ruby script and save all of the images off of my blog.

I can't figure out how to save the image files after I've identified them. Any help would be much appreciated.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = '[my blog url]'
doc = Nokogiri::HTML(open(url))

doc.css("img").each do |item|
  #something
end

Solution

  • URL = '[my blog url]'
    
    require 'nokogiri' # gem install nokogiri
    require 'open-uri' # already part of your ruby install
    
    Nokogiri::HTML(open(URL)).xpath("//img/@src").each do |src|
      uri = URI.join( URL, src ).to_s # make absolute uri
      File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) }
    end
    

    Using the code to convert to absolute paths from here: How can I get the absolute URL when extracting links using Nokogiri?