I need to parse this page https://www.petsonic.com/snacks-huesos-para-perros/ and recieve information from every item(name,price,image,etc.). The problem is that i don't know how to parse array of URL. If i were using 'open-uri' i would do something like this
require 'nokogiri'
require 'open-uri'
page="https://www.petsonic.com/snacks-huesos-para-perros/"
doc=Nokogiri::HTML(open(page))
links=doc.xpath('//a[@class="product-name"]/@href')
links.to_a.each do|url|
doc2=Nokogiri::HTML(open(url))
text=doc2.xpath('//a[@class="product-name"]').text
puts text
end
However, i am only allowed to use 'Curb' and that's making me confused
You can use the curb gem
gem install curb
Then in your ruby script
require 'curb'
page = "https://www.petsonic.com/snacks-huesos-para-perros/"
str = Curl.get(page).body
links = str.scan(/<a(.*?)<\/a\>/).flatten.select{|l| l[/class\=\"product-name/]}
inner_text_of_links = links.map{|l| l[/(?<=>).*/]}
puts inner_text_of_links
The hard part of this was the regex let's break it down. To get the links we just scan the string for <a>
tags, then get those into an array and flatten them into one array.
str.scan(/<a(.*?)<\/a\>/)
Then we select the items which match our pattern. We are looking for the class you specified.
.select{|l| l[/class\=\"product-name/]}
Now to get the innertext of the tag we just map it using a look behind regex
inner_text_of_links = links.map{|l| l[/(?<=>).*/]}