Search code examples
ruby-on-railsrubyscreen-scrapingmechanize

how can i get the size or weight from images url?


referring to the previous question methods width and height Mechanize

I would like to know how can I get the size from web page image's with Mechanize.

I have created a method for use like helper but the process is very slow, e.g.

url = "http://www.birchbox.com"
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.url.to_s }.compact

This is the helper method:

def check_image_size
  images_urls.each do |image_url|
   image = MiniMagick::Image.open(image_url)
   if image[:width] < 100
    images_urls.delete("#{image_url}")
   end
  end
return images_urls
end

This method removes all images from the array if they have a smaller width of 100px.

The problem with this method is that the process is very slow. My page takes too long to load with this method.

Is there any quick and easy way to do this with Mechanize?


Solution

  • If you want the real size of the image, you're going to have to fetch it.

    As you noted, that can take a long time. One way to hasten this would be to not fetch the whole image but instead fetch it progressively and parse it as it comes. You can stop reading the image as soon as you have enough of it to determine its size.

    That's fairly complicated and probably won't work all the time, since for some image types you'd need to fetch the image fully in order to know the size (I think).