Search code examples
javascriptunicodenormalize

JavaScript string normalize() does not work in some cases - why?


In my debugger I can see an object is retrieved with a unicode character. e.g.

{
    name: "(Other\uff09"
}

If this object is referenced using the var myObj in the debugger I see that

myObj.name.normalize()

returns

"(Other\uff09"

If instead I use

"(Other\uff09".normalize()

it returns

"(Other)"

Why?


Solution

  • I've learnt a little since posting this question. Initially I thought that \uff09 was equivalent to ). It's not, instead it's , aka Full Width Right Parenthesis.

    This essentially means that the opening and closing brackets do not match. e.g. ( is not closed by . This is causing parsing issues for the NodeMailer AddressParser module I'm using.

    To normalize it I'm using

    JSON.parse(`"(Other\uff09"`).normalize("NFKC")
    

    which translates the string to (Other) rather than (Other).

    Breaking this down

    JSON.parse(`"(Other\uff09"`)
    

    where the double quotes is required, produces (Other). i.e. It converts the unicode escape characters into the character they represent.

    Then

    .normalize("NFKC")
    

    converts that into (Other).

    This worked for me in this sample. I still need to see if it scales up.