Search code examples
c#ascii

How can you strip non-ASCII characters from a string? (in C#)


How can you strip non-ASCII characters from a string? (in C#)


Solution

  • string s = "søme string";
    s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);
    

    The ^ is the not operator. It tells the regex to find everything that doesn't match, instead of everything that does match. The \u####-\u#### says which characters match.\u0000-\u007F is the equivalent of the first 128 characters in utf-8 or unicode, which are always the ascii characters. So you match every non ascii character (because of the not) and do a replace on everything that matches.

    (as explained in a comment by Gordon Tucker Dec 11, 2009 at 21:11)