Search code examples
c#regexregex-negation

Regex to exclude non-word Characters but leave spaces


I am trying to write a Regex to stop a use entering invalid characters into a postcode field.

from this link I manged to exclude all "Non-word" characters like so.

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

But this also excludes the "Space" characters.

I am sure this is possible but I find regex very confusing!

Can someone help out with an explanation of the regex pattern used?


Solution

  • You may use character class subtraction:

    [\W_-[\s]]+
    

    It matches one or more non-word and underscore symbols with the exception of any whitespace characters.

    To exclude just horizontal whitespace characters use [\p{Zs}\t] in the subtraction part:

    [\W_-[\p{Zs}\t]]+
    

    To exclude just vertical whitespace characters (line break chars) use [\n\v\f\r\u0085\u2028\u2029] in the subtraction part:

    [\W_-[\n\v\f\r\u0085\u2028\u2029]]+
    

    Non-character class substraction solution (that is more portable) is

    [^\w\s]+
    

    It matches one or more characters other than word and whitespace characters. Note that this still won't match _ that are considered word characters (this is important in string tokenization scenarios where (?:[^\w\s]|_)+ or [_\W-[\s]] is preferable).