Search code examples
rubyrecursionmechanizeweb-crawler

Use Mechanize to retrieve ALL links of a website


How can I use the Mechanize library to find all the links on a website?

I'de like to parse the internal links recursively in order to grab all the links of a website.


Solution

  • Have you looked at the Anemone gem? It was specifically created for spidering websites.

    You could do something like this to grab and print all the links of a website:

    require 'anemone'
    
    Anemone.crawl("http://www.example.com/") do |anemone|
      anemone.focus_crawl { |page| puts page.links }
    end
    

    It is fairly well documented with options to select if you want to spider the entire side, exclude certain types of links, or exclude links that are like something.