I am parsing CNN.com to get the top five news storeis with their first paragraph. I have the following code.
url = "http://edition.cnn.com/?refresh=1"
agent = Mechanize.new
page = agent.get("http://edition.cnn.com/?refresh=1")
page.search("//div[@id='cnn_maintt2bul']/div/div/ul/li[count(*)=3]/a").map{|a| page.uri.merge a[:href]}.each do |uri|
article = agent.get(uri).parser
puts article.css(".adtag15090+ p").text
puts "\n"
end
It's not perfect but it works, however, it retrieves all the articles yet I want to retrieve only five articles. Is there a way perhaps using ranges to limit the number of results to five?
The simple way to do it is to add an array slice after search
. Nokogiri returns a NodeSet from a search
, and NodeSet supports []
:
page.search("//div[@id='cnn_maintt2bul']/div/div/ul/li[count(*)=3]/a")[0, 5]...