Search code examples
javascriptunicode

Using charCodeAt(0) for an emoji returns wrong number


The following doesn't seem correct

"🚀".charCodeAt(0);  // returns 55357 in both Firefox and Chrome

That's a Unicode character named ROCKET (U+1F680), the decimal should be 128640.

This is for a unicode app am writing. Seems most but not ALL chars from unicode 6 all stuck at 55357.

How can I fix it?


Solution

  • JavaScript is using UTF-16 encoding; see this article for details:

    Characters outside the BMP, e.g. U+1D306 tetragram for centre (𝌆), can only be encoded in UTF-16 using two 16-bit code units: 0xD834 0xDF06. This is called a surrogate pair. Note that a surrogate pair only represents a single character.

    The first code unit of a surrogate pair is always in the range from 0xD800 to 0xDBFF, and is called a high surrogate or a lead surrogate.

    The second code unit of a surrogate pair is always in the range from 0xDC00 to 0xDFFF, and is called a low surrogate or a trail surrogate.

    You can decode the surrogate pair like this:

    codePoint = (text.charCodeAt(0) - 0xD800) * 0x400 + text.charCodeAt(1) - 0xDC00 + 0x10000
    

    Complete code can be found can be found in the Mozilla documentation for charCodeAt.