How can you strip non-ASCII characters from a string? (in C#)
string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);
The ^
is the not operator. It tells the regex to find everything that doesn't match, instead of everything that does match. The \u####-\u####
says which characters match.\u0000-\u007F
is the equivalent of the first 128 characters in utf-8 or unicode, which are always the ascii characters. So you match every non ascii character (because of the not) and do a replace on everything that matches.
(as explained in a comment by Gordon Tucker Dec 11, 2009 at 21:11)