Search code examples
c#regexnon-english

Foreign language characters in Regular expression in C#


In C# code, I am trying to pass chinese characters: " 中文ABC123".

When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$",

it doesn't pass for "中文ABC123" and regex validation fails.

What other expressions do I need to add for C#?


Solution

  • To match any letter character from any language use:

    \p{L}
    

    If you also want to match numbers:

    [\p{L}\p{Nd}]+
    

    \p{L} ... matches a character of the unicode category letter.
                    it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
                      \p{Ll} ... matches lowercase letters. (abc)
                      \p{Lu} ... matches uppercase letters. (ABC)
                      \p{Lt} ... matches titlecase letters.
                      \p{Lm} ... matches modifier letters.
                      \p{Lo} ... matches letters without case. (中文)

    \p{Nd} ... matches a character of the unicode category decimal digit.

    Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$