Search code examples
phpregexunicodepcrecharacter-properties

Regex - Unicode Properties Reference and Examples


I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ^ for instance.

RegexBuddy Unicode Properties

Is there any documentation / reference with examples on regular expressions Unicode properties?


Solution

  • A list of Unicode properties can be found in http://www.unicode.org/Public/UNIDATA/PropList.txt.

    The properties for each character can be found in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (1.2 MB).

    In your case,

    • + (PLUS SIGN) is Sm,
    • - (HYPHEN-MINUS) is Pd,
    • * (ASTERISK) is Po,
    • / (SOLIDUS) is also Po, and
    • ^ (CIRCUMFLEX ACCENT) is Sk.

    You're better off matching them with [-+*/^].