I am trying to write a Regex
to stop a use entering invalid characters into a postcode field.
from this link I manged to exclude all "Non-word" characters like so.
Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
But this also excludes the "Space" characters.
I am sure this is possible but I find regex very confusing!
Can someone help out with an explanation of the regex pattern used?
You may use character class subtraction:
[\W_-[\s]]+
It matches one or more non-word and underscore symbols with the exception of any whitespace characters.
To exclude just horizontal whitespace characters use [\p{Zs}\t]
in the subtraction part:
[\W_-[\p{Zs}\t]]+
To exclude just vertical whitespace characters (line break chars) use [\n\v\f\r\u0085\u2028\u2029]
in the subtraction part:
[\W_-[\n\v\f\r\u0085\u2028\u2029]]+
Non-character class substraction solution (that is more portable) is
[^\w\s]+
It matches one or more characters other than word and whitespace characters. Note that this still won't match _
that are considered word characters (this is important in string tokenization scenarios where (?:[^\w\s]|_)+
or [_\W-[\s]]
is preferable).