Search code examples
rubynokogiriwhitespacemechanizemechanize-ruby

I can't remove whitespaces from a string parsed by Nokogiri


I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.


Solution

  • strip only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.

    Removing the character is easy. You can use gsub by providing a regex with the character code:

    gsub(/\u00a0/, '')
    

    You could also call

    gsub(/[[:space:]]/, '')
    

    to remove all Unicode whitespace. For details, check the Regexp documentation.