Search code examples
ruby-on-railsrubyweb-scrapingruby-on-rails-5mechanize

How to scrape icon link of image using mechanize gem


I have a url where I have to scrape all images using mechanize gem, but some image url's are in rel=icon.

I have to get the image from this url:

<link rel="icon" href="https://mywebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png" sizes="32x32">

This is my code I tried which scrapes only images. How to get both working as one.

require 'mechanize'
url = "https://mywebsite.com/"

agent = Mechanize.new
page = agent.get(url)

page.images.each do |image|
  puts image #getting here all images here from image tag
end

Solution

  • I looked over Mechanize Page Link but it returns only the anchors.

    Tried it with xpath

    page.xpath('//link[contains(@rel, "icon")]').each do |icon|
      p icon.attr('href')
    end
    

    And received:

    "https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png"
    "https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-192x192.png" 
    "https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-180x180.png"
    

    Here is a Replit that returns all the images.