Search code examples
javascriptiosregexunicodeemoji

Emojis corrupted on iOS when using negation with unicode character class escapes


I'm using regular expressions to remove non-latin and non-emoji characters from strings.

As Unicode character class escapes are now widely supported, I used them to simplify my expressions.

const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');

function removeUnsupportedChars(txt: string) {
   return txt.replace(this.characterEx, '');
}

This works on PC and on Android. However, on iOS, when using this regular expression, emojis get corrupted and shown as squares.

enter image description here

I created a minimal CodePen where the scenario is reproduced with a simplified regex and it seems like on iOS any usage of negation on the Extended_Pictographic class (or any of the other emoji classes) leads to their corruption.

enter image description here

Is this a known issue on iOS? Any known workarounds (other than using explicit emoji lists)?


Solution

  • I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.

    I chose to use a positive regex and use match to combine the pieces that DO match the regex, instead of using a negative regex with replace:

    const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;
    
    function removeUnsupportedChars(txt: string) {
        const matches = txt.match(this.characterEx) || [];
        return matches.join('');
    }