I'm using regular expressions to remove non-latin and non-emoji characters from strings.
As Unicode character class escapes are now widely supported, I used them to simplify my expressions.
const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');
function removeUnsupportedChars(txt: string) {
return txt.replace(this.characterEx, '');
}
This works on PC and on Android. However, on iOS, when using this regular expression, emojis get corrupted and shown as squares.
I created a minimal CodePen where the scenario is reproduced with a simplified regex and it seems like on iOS any usage of negation on the Extended_Pictographic
class (or any of the other emoji classes) leads to their corruption.
Is this a known issue on iOS? Any known workarounds (other than using explicit emoji lists)?
I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.
I chose to use a positive regex
and use match
to combine the pieces that DO match the regex, instead of using a negative regex
with replace
:
const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;
function removeUnsupportedChars(txt: string) {
const matches = txt.match(this.characterEx) || [];
return matches.join('');
}