Search code examples
javascriptunicodeescapingutf-16rawstring

In JavaScript, is there a way to iterate over the lexical tokens of a string?


Given this string, which I receive from an endpoint:

"\u0000\u0000\u0000\u001A%<some-random-text\fcdtoolHxxx1-34e3-4069-b97c-xxxxxxxxxxx\u001E\n"

I would like to iterate the string to escape every sequence that starts with \u. The resulting string would be:

"\\u0000\\u0000\\u0000\\u001A%<some-random-text\fcdtoolHxxx1-34e3-4069-b97c-xxxxxxxxxxx\\u001E\n"

Notice how \f and \n aren't escaped. So, how can I escape only those \u sequences?

Using a regular expression like this one, will not work, because the sequences \f and \n will also be replaced, but they should be untouched.

function escapeUnicode(str: string) {
  return s.replace(/[\u0000-\u001F]/gu, function (chr) {
     return "\\u" + ("0000" + chr.charCodeAt(0).toString(16)).slice(-4);
  });
}

There's String.raw but unless you pass the string as a literal it won't work. For instance, in the code below using it as a literal I could do:

let s = String.raw`\u0000\u0000\u0000\u001A%<deployment-deploymentStepStart\fcdtoolHb3dccc41-8cf0-4069`;
var escaped = String.raw``;

for (let i = 0, j = i + 1; i < s.length - 1; i++,j=i+1) {
  let curChar = String.fromCharCode(s.charCodeAt(i));
  let nextChar = String.fromCharCode(s.charCodeAt(j));
  if (curChar === "\\" && nextChar === "u") {
      escaped += String.raw`\\u`;
      i++;
  } else {
     escaped += curChar;
  }
}

escaped += String.fromCharCode(s.charCodeAt(s.length - 1));

console.log(escaped);

But as I mentioned above, the text comes from and endpoint, so if we store it in a variable and then try to do the same for loop it won't work.

let someVariable = "\u0000\u0000\u0000\u001A%<deployment-deploymentStepStart\fcdtoolHb3dccc41-8cf0-4069"
let s = String.raw({raw: someVariable});
// ... rest of the code above

Solution

  • You can achieve this using JSON.stringify:

    var examplestring = `\u0000\u0000\u0000\u001A%<some-random-text\fcdtoolHxxx1-34e3-4069-b97c-xxxxxxxxxxx\u001E\n`
    //basic example
    console.log(examplestring)
    console.log(JSON.stringify(examplestring))
    console.log(JSON.stringify(examplestring).replaceAll('\\u','\\\\u'))
    
    //using your example code:
    var s = JSON.stringify(examplestring);
    var escaped =  String.raw``;
    
    for (let i = 0, j = i + 1; i < s.length - 1; i++,j=i+1) {
    let curChar = String.fromCharCode(s.charCodeAt(i));
    let nextChar = String.fromCharCode(s.charCodeAt(j));
    if (curChar === "\\" && nextChar === "u") {
    escaped += String.raw`\\u`;
     i++;
     } else {
     escaped += curChar;
     }
    }
    
    escaped += String.fromCharCode(s.charCodeAt(s.length - 1));
    
    console.log(escaped);