Search code examples
rubyscraperopen-uri

How can I count the amount of outbound links a page has?


Learning scraping with Ruby. I'm trying to count the amount of outbound links a given page has, but I'm not sure how to tell Ruby I only want the outbound links counted.

My current code:

require "open-uri"

# Collect info
puts "What is your URL?"
url = gets.chomp
puts "Your URL is #{url}"
puts "Loading..."

# Check keyword count
page = open(url).read
link_total = page.scan("</a>")
# obl_count = ???
link_count = link_total.count
puts "Your site has a total of #{link_count} links."

How can I complete this?


Solution

  • Just as you should never parse HTML with regular expressions, you should probably be using nokogiri to do the dirty work for you.

    In simple terms you can use CSS selectors to find tags. From there it's easy to count:

    Nokogiri::HTML(page).css('a').length