Search code examples
javascriptunicodehexunicode-escapessurrogate-pairs

Unicode surrogate pairs and String.fromCodePoint() — JavaScript


I'm dealing with raw strings containing escape sequences for surrogate halves of UTF astral symbols. (I think I got that lingo right…)

console.log("\uD83D\uDCA9")
// => 💩

Let's use the above emoji as an example. If I have the surrogate pair (\uD83D\uDCA9) How can I in turn take it's hexadecimal values and turn it into a valid argument for Javascript's String.fromCodePoint() function?

I've tried the following:

const codePoint = ["D83D", "DCA9"].reduce((acc, cur) => {
    return acc += parseInt(cur, 16);
}, 0);

console.log(String.fromCodePoint(codePoint));
// => 𛓦 (some weird symbol appears, not 💩!)

PS: I'm familiar with ES6 escape sequences which show hexadecimal values between brackets {…} instead of using surrogate halves. But I need to do this with surrogate pairs!

Any suggestions are greatly appreciated.


Solution

  • You can pass a list of values to the function:

    console.log(String.fromCodePoint(0xd83d, 0xdca9));
    

    Thus a "valid argument" for String.fromCodePoint() is not necessarily a single value, and indeed for a character that requires a surrogate pair it by definition cannot be a single value. Why? Because each individual numeric source value, as far as String.fromCodePoint() is concerned, must be a 16-bit (2-byte) value. If you could pass bigger single numbers, there would be no need for surrogate pairs!

    Edit: much of the above paragraph is inaccurate; the .fromCodePoint() method will accept full Unicode code point values (greater than 16 bits). Of course it still has to split them into surrogate pairs because JavaScript strings are UTF-16, but what it means is that if you happen to have full-size Unicode code points you don't have to split them up yourself, which is nice. However if you do have pairs already, there's really no point combining them yourself because the method also works on the pairs when passed as part of a list of points.

    If you have values in an array, you can invoke the function with apply:

    var points = [0xd83d, 0xdca9];
    console.log(String.fromCodePoint.apply(String, points));