Search code examples
javascriptjsonunicode

Meaning of escaped unicode characters in JSON


In JSON, Unicode characters can be escaped using the \uXXXX notation. I assume the XXXX obviously refers to a Unicode code point in hexadecimal.

But since there are only 4 digits, does this mean there is no way to escape codepoints which are > 0xFFFF?

Or does the \uXXXX not actually encode abstract code points, but actually units of UTF-16-BE encoded bytes?


Solution

  • Update 4/28/2024

    It has for some time been possible to use sequences like \u{X} or \u{XXXXXX} to represent code points, including those greater than 0xFFFF.

    var s = '\u{2f804}';
    alert(s + '::' + s.length); // 你::2
    

    It should be \uXXXX and yes, it is possible to represent characters greater than 0xFFFF using high and low surrogates along the lines you mention.

    var s = '\uD87E\uDC04';
    alert(s + '::' + s.length); // 你::2