Search code examples
javascriptunicodeemoji

Generating an emoji from unicode and then extracting unicode gives a different unicode


enter image description here

one and two are created with different uni-codes.

But they give the same emoji.

Now the interesting point is Unicode from one is completely a different value.

one = String.fromCodePoint(parseInt("1f436",16))
two = String.fromCodePoint(parseInt("d83d",16),parseInt("dc36",16))
one === two
one.length
one.codePointAt(0).toString(16) + '-' + one.codePointAt(1).toString(16)

Solution

  • Code point values range from U+0000 to U+10FFFF.


    Initially JavaScript had Unicode escape sequences which consist of exactly 4 hexadecimal digits to represent a code point. U+0000 to U+FFFF could be represented using these.

    eg:

    >> 'I \u2661 JavaScript!'
    'I ♡ JavaScript!'
    

    In ES6, JS introduced Unicode code point escapes.

    >> '\u{1F4A9}'
    '💩' // U+1F4A9 PILE OF POO
    

    Between the braces, you can use up to six hexadecimal digits, which is enough to represent all Unicode code points.

    For backward compatibility with ECMAScript 5 and older environments, the unfortunate solution is to use surrogate pairs:

    >> '\uD83D\uDCA9'
    '💩' // U+1F4A9 PILE OF POO
    

    ECMAScript 6 introduces String.prototype.codePointAt(position) which would deal with surrogate halves whenever possible and returns the hex.

    >> '💩'.codePointAt(0)
    0x1F4A9
    

    source : https://mathiasbynens.be/notes/javascript-unicode

    video : https://www.youtube.com/watch?v=zi0w7J7MCrk