Search code examples
c++jsonc++11utf-8jsoncpp

Standard way of Serializing utf-8 characters in a JSON String


What's the standard way of serializing a utf-8 string in JSON? Should it be with u escaped sequence or should it be the hex code.

I want to serialize some sensor readings with units in a JSON Format.

For example I have temperature readings with units °C. Should it be serialized as

{
 "units": "\u00b0"
}
´´´
or should it be something like 
´´´
{
 "units":"c2b0"
}

Or could both of these supported by the standard.


Solution

  • If JSON is used to exchange data, it must use UTF-8 encoding (see RFC8259). UTF-16 and UTF-32 encodings are no longer allowed. So it is not necessary to escape the degree character. And I strongly recommend against escaping unnecessarily.

    Correct and recommended

    {
      "units": "°C"
    }
    

    Of course, you must apply a proper UTF-8 encoding.

    If JSON is used in a closed ecosystem, you can use other text encodings (though I would recommend against it unless you have a very good reason). If you need to escape the degree character in your non-UTF-8 encoding, the correct escaping sequence is \u00b0.

    Possible but not recommended

    {
      "units": "\u00b0C"
    }
    

    Your second approach is incorrect under all circumstances.

    Incorrect

    {
      "units":"c2b0"
    }
    

    It is also incorrect to use something like "\xc2\xb0". This is the escaping used in C/C++ source code. It also used by debugger to display strings. In JSON, it always invalid.

    Incorrect as well

    {
        "units":"\xc2\xb0"
    }