Search code examples
c#non-ascii-characters

how to replace non-ascii Western Latin characters like '┌''├''⌐''┐''┴' in xml c#


how to remove non ASCII characters like inverted ''T'' , "L" etc, in xml c#

I have tried Sanitize Xml String like

(character >= 0x20 && character <= 0xD7FF) ||
(character >= 0xE000 && character <= 0xFFFD) ||
(character >= 0x10000 && character <= 0x10FFFF)

And used Regex as below:

Regex.Replace(inputText, @"[^><#\w\.@-]", "");
(or)
string str = str.replace(/[^A-Za-z 0-9 \.,\?""!@#\$%\^&\*\(\)-_=\+;:<>\/\\\|\}\{\[\]`~]*/g, '')

And Pattern replace as below:

string pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])";

And finally with

XmlConvert.VerifyXmlChars(text);

But no use, characters looks like below: '┌''├''⌐''┐''┴'

Please see this link https://en.wikipedia.org/wiki/Western_Latin_character_sets_%28computing%29

└ U+2514 C0 C0
┘ U+2518 D9 D9

Please, help me out of this. Thanks in advance


Solution

  • Try This
    
    string s = "søme string";
    s = Regex.Replace(s, @"[^\u0000-\u007F]", string.Empty);