Search code examples
ruby-on-railsunicodewebrick

unicode failures in Rails over Webrick


Rails 3.2.2
Ruby 1.9.3
all browsers I have tried (Aurora, IE, Chrome)
Windows 7

I have enjoyed being able to reliably pass unicode back and forth between the database and the client. So far as I have been able to tell it works flawlessly.

However that only makes the problem I am having even more vexing. When working within Rails with string literals that contain various special characters (in a method defined for a model for example) I am causing Webrick to completely fail, and return a 500 "We're sorry, but something has happened" error.

For example suppose I have the string "O₂". I can post that value into a form and get it again later on and everything comes out fine. But say I have a method

def fix_name molecule_name 
  fixed_name = molecule_name
  case molecule_name
  when 'O2' then fixed_name = 'O₂'
  end
  return fixed_name
end

Then if I call fix_name, even if the case falls through without matching, the server fails abruptly (right after saying it has successfully rendered the generic new.html page).

Furthermore, if I switch to specifying unicode directly, as in

def fix_name molecule_name 
  fixed_name = molecule_name
  case molecule_name
  when 'O2' then fixed_name = "O\x20\x82"
  end
  return fixed_name
end

I get the generic "�" character instead of "₂".

Has anyone else had this problem? What could be going on here?

updated

Okay, having educated myself a little better on Unicode and UTF-8 I am able to be a little less neanderthal about this.

The fix for the code I posted is either

def fix_name molecule_name 
  fixed_name = molecule_name
  case molecule_name
  when 'O2' then fixed_name = "O\xe2\x82\x82"
  end
  return fixed_name
end

or probably better:

def fix_name molecule_name 
  fixed_name = molecule_name
  case molecule_name
  when 'O2' then fixed_name = "O\u{2082}"
  end
  return fixed_name
end

So the moral there is to get the byte code right!

But that still doesn't explain why I can't put the literal character in.


Solution

  • Ruby 1.9.3 is, let us say, 'stringent' about encoding in your rb files.

    http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/

    http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings

    the short answer is you should try adding:

    # encoding: UTF-8
    

    to the top of your .rb file, and then ruby 1.9.3 should not choke on UTF chars in your code.