Search code examples
regexpcre

Why fountain character matches anchor character in regular expression inside range?


In PCRE2 I got a problem with regular expression /[⚓️]/. It has match for string "⛲️".

Demo

Unexpected behaviour persists only inside range.

Can somebody explain why is this happening?

By the way, PCRE1 works just fine: no matches.


Solution

  • The ⚓️ emoji is a sequence of two Unicode code points: \x{2693}\x{FE0F}. You can test it and see that \x{2693}\x{FE0F} regex matches ⚓️.

    When you place the \x{2693}\x{FE0F} into a character class, you find a match in both ⛲️ (=\x{26F2}\x{FE0F}) and ⚓️ since both contain at least one of the Unicode code points.

    As a workaround, place the emojis into a non-capturing group rather than a character class, e.g. (?:⚓️|[a-z0-9]) will match a ⚓️ or a lowercase ASCII letter/digit.