Search code examples
ruby-on-railsjsonencodingunicoderuby-on-rails-4

Why does to_json escape unicode automatically in Rails 4?


Rails 3:

{"a" => "<br/>"}.to_json
=> "{\"a\":\"<br/>\"}"

Rails 4:

{"a" => "<br/>"}.to_json
=> "{\"a\":\"\\u003Cbr/\\u003E\"}"

WHY???

It appears to be causing the error

Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8

When my Rails 3 app tries to parse JSON generated by my rails 4 app.


Solution

  • WHY???

    To defend against a common weakness in web applications. If you say in an HTML page eg:

    <script type="text/javascript">
        var something = <%= @something.to_json.html_safe %>;
    </script>
    

    then you might think you're fine because you've JSON-escaped the data you're injecting into JavaScript. But actually you're not safe: aside from JSON syntax you also have surrounding HTML syntax, and in an HTML script block </ is in-band signalling. Practically, if @something contains the string </script> you've got a cross-site scripting vulnerability as this comes out:

    <script type="text/javascript">
        var something = {"attack": "abc</script><script>alert('XSS');//"};
    </script>
    

    The first script block ends halfway through the string (leaving an unclosed string literal syntax error) and the second <script> is treated as a new script block and the potentially-user-submitted content within it executed.

    Escaping the < character to \u003C is not required by JSON but it is a perfectly valid alternative and it automatically avoids this class of problems. If a JSON parser rejects it, that is a severe bug in the reader.

    What is the code that is producing that error? I'm not convinced the error is anything to do with the <-escaping, as it is talking about byte 0xC3 rather than 0x3C. That could be indicative of a string with UTF-8 encoded content not having been marked as UTF-8... maybe you need a force_encoding("UTF-8") on the input?