How can I use the Mechanize
library to find all the links on a website?
I'de like to parse the internal links recursively in order to grab all the links of a website.
Have you looked at the Anemone gem? It was specifically created for spidering websites.
You could do something like this to grab and print all the links of a website:
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.focus_crawl { |page| puts page.links }
end
It is fairly well documented with options to select if you want to spider the entire side, exclude certain types of links, or exclude links that are like something.