I have a string which contains xml. It has the following substring
<Subject>&#55357;&#56898;&#55357;&#56838;&#55357;&#56846;&#55357;&#56838;&#55357;&#56843;&#55357;&#56838;&#55357;&#56843;&#55357;&#56832;&#55357;&#56846;</subject>
I'm pulling the xml from a server and I need to display it to the user. I've noticed the ampersand has been escaped and there are utf-16 surrogate pairs. How do I ensure the emojis/emoticons are displayed correctly in a browser.
Currently I'm just getting these characters: �������������� instead of the actual emojis.
I'm looking for a simple way to fix this without any external libraries or any 3rd party code if possible just plain old javascript, html or css.
You can convert UTF-16 code units including surrogates to a JavaScript string with String.fromCharCode
. The following code snippet should give you an idea.
var str = '&#55357;&#56898;ABC&#55357;&#56838;&#55357;&#56846;&#55357;&#56838;&#55357;&#56843;&#55357;&#56838;&#55357;&#56843;&#55357;&#56832;&#55357;&#56846;';
// Regex matching either a surrogate or a character.
var re = /&#(\d+);|([^&])/g;
var match;
var charCodes = [];
// Find successive matches
while (match = re.exec(str)) {
if (match[1] != null) {
// Surrogate
charCodes.push(match[1]);
}
else {
// Unescaped character (assuming the code point is below 0x10000),
charCodes.push(match[2].charCodeAt(0));
}
}
// Create string from UTF-16 code units.
var result = String.fromCharCode.apply(null, charCodes);
console.log(result);