Search code examples
rubyweb-scrapingnokogiriscreen-scrapingopen-uri

Ruby - nokogiri, open-uri - Fail to parse page


This code work on some pages, like klix.ba, but cant figure out why it doesn't work for others.

There is no error to explain what went wrong, nothing.

If puts page works, which means I can target the page, and parse it, why I cant get single elements?

require 'nokogiri'
require 'open-uri'


url = 'http://www.olx.ba/'

user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7"


page = Nokogiri::XML(open(url,'User-Agent' => user_agent), nil, "UTF-8")

#puts page - This line work

puts page.xpath('a')

Solution

  • First of all, why are you parsing it as XML? The following should be correct, considering your page is a HTML website:

    page = Nokogiri::HTML(open(url,'User-Agent' => user_agent), nil, "UTF-8")
    

    Furthermore, if you want to strip out all the links (a-tags), this is how:

    page.css('a').each do |element|
       puts element
    end