I have this hash:
a={"topic_id"=>60693, "urlkey"=>"innovacion", "name"=>"Innovaci\xF3n"}
and I am trying to save it to MongoDB using Mongoid, when I get this error:
BSON::InvalidStringEncoding: String not valid UTF-8
I am then trying to gsub
it:
a["name"].gsub(/\xF3/,"o")
and I get: SyntaxError: (pry):12: too short escaped multibyte character: /\xF3/
I have added a magic comment at the beginning of my model file:# encoding: UTF-8
Hexidecimal 0xF3 by itself is not valid UTF-8. Values greater than 0x7F are all multi-byte characters. What makes you think it should be UTF-8?
You can read up on the allowable sequences here: http://en.wikipedia.org/wiki/UTF-8#Description
If you need to force the ruby string to assume an encoding that allows arbitrary byte sequences, you can force it to binary:
str.force_encoding("BINARY")
With a binary encoding, #gsub
and other string operations that rely on valid encodings will work on a byte-by-byte basis, instead of a character-by-character basis.