Search code examples
rubymongodbmongoidpadrino

Why \xF3 is not recognized as UTF-8


I have this hash:

a={"topic_id"=>60693, "urlkey"=>"innovacion", "name"=>"Innovaci\xF3n"}

and I am trying to save it to MongoDB using Mongoid, when I get this error:

BSON::InvalidStringEncoding: String not valid UTF-8

I am then trying to gsub it:

a["name"].gsub(/\xF3/,"o")

and I get: SyntaxError: (pry):12: too short escaped multibyte character: /\xF3/

I have added a magic comment at the beginning of my model file:# encoding: UTF-8


Solution

  • Hexidecimal 0xF3 by itself is not valid UTF-8. Values greater than 0x7F are all multi-byte characters. What makes you think it should be UTF-8?

    You can read up on the allowable sequences here: http://en.wikipedia.org/wiki/UTF-8#Description

    If you need to force the ruby string to assume an encoding that allows arbitrary byte sequences, you can force it to binary:

    str.force_encoding("BINARY")
    

    With a binary encoding, #gsub and other string operations that rely on valid encodings will work on a byte-by-byte basis, instead of a character-by-character basis.