Search code examples
javascriptfilexmlhttprequestemojisurrogate-pairs

emoji surrogate string with Javascript. How to parse?


I want to express emoji with javascript. I have a file like...

:-),\ud83d\ude03
^^,\ud83d\ude03
^_^,\ud83d\ude03
:),\ud83d\ude03
:D,\ud83d\ude03

which contains key and emoji surrogate as value. I am going to read this and if input string matches with key, replace the word with those emoji.

i.e. type "^^" will be replace with smile mark.

But there is something weird, if I put those informations as object, it prints emoji well.

like...

this.emojiStore.osx = {
                //smile
                ':-)' : '\ud83d\ude03'
                , '^^' : '\ud83d\ude03'
                , '^_^' : '\ud83d\ude03'
                , ':)' : '\ud83d\ude03'
                , ':D' : '\ud83d\ude03'
                //frawn
                , ':(' : '\ud83d\ude1e'
                //crying
                , 'T^T' : '\ud83d\ude22'
                , 'T_T' : '\ud83d\ude22'
                , 'ㅜㅜ' : '\ud83d\ude22'
                , 'ㅠㅠ' : '\ud83d\ude22'
                //poo 
                , 'shit' : '\ud83d\udca9'
        };

and replace part looks like ...

this.value = emojiList[key];

But when I read infos from file, it print string like '\ud83d\ude22'.

How can I express surrogate string with js?(I do not want to use 3rd party libraries.)

FYI, js file and target file both encoded with UTF-8.

======== File Loading Part

function loadFile(url){
    var ret = {};
    var rawFile = new XMLHttpRequest();
//    rawFile.overrideMimeType('text/html; charset=utf-8');
    rawFile.open("GET", url, false);
    rawFile.onreadystatechange = function (){
        if(rawFile.readyState === 4){
            if(rawFile.status === 200 || rawFile.status == 0) {
                var allText = rawFile.responseText;
                var textByLine = allText.split('\n');
                for(var i = 0; i < textByLine.length; i++){
                    if(textByLine[i].trim().length < 1) continue;
                    var key = textByLine[i].split(',')[0].trim();
                    var value = textByLine[i].split(',')[1].trim();
                    ret[key] = value;
                }
            }
        }
    };
    rawFile.send(null);
    console.log(ret);
    return ret;
}

=========== Edited

I found a hint.

When I read from file it chnage \u to \\u, while when I read from text it maintains what it is.

i.e

  • file version : \ud83d\ude03 to \\ud83d\\ude03
  • script version : \ud83d\ude03 itself

So point is how to prevent it changes \ to \\.

I still do not find the answer though.


Solution

  • found the answer I guess.

    refer follow link : How do I decode a string with escaped unicode?

    function parseUnicode(str){
        var r = /\\u([\d\w]{4})/gi;
        str = str.replace(r, function (match, grp) {
            return String.fromCharCode(parseInt(grp, 16)); } );
        return str;
    }
    

    for reference, js String value can be different with the string what comes from a file. I made a function to check each character and result is different.

    function charAnalyst(str){
    
        var result = '';
        for(var i = 0; i < str.length; i++){
            var aChar = str.charAt(i);
            result += aChar;
            console.log(aChar);
        }
        console.log(result);
    }
    

    I hope this would be save your time :D