Search code examples
rubystringencodinghpricot

Strange symbols in web-page's source


i've got a problem i try to parce a web page which in UTF-8 and have russian text by using Hpricot

The problem is that i get russian text with some strange symbols and i get an error when i try to convert (iconv) from UTF-8 to windows-1251 or ASCII

this page http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548

So

require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'net/http'

url = "http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548"
f = open(url).read
doc =  Hpricot(f)
html = doc.search("th.b-properties__title")
html.each do |h|
puts h.inner_html
end

This source is in UTF-8 BUT! there are several strange symbols such as "\u{2192}"


Solution

  • so, i solved it. i used PowerShell on windows and used chcp 65001 to output everything in UTF8 so that was the problem!