I am getting hexadecimal representation of unicode characters in my string and want to replace that with empty string. More specifically, trying to match all values within \u0000-\u007F in a string using regex to replace it with empty string with C#.
Example 1:
InputString: "\u007FTestString"
ExpectedResult: TestString
Example 2:
InputString: "\u007FTestString\U0000"
ExpectedResult: TestString
My current solution does
if (!string.IsNullOrWhiteSpace(testString))
{
return Regex.Replace(testString, @"[^\u0000-\u007F]", string.Empty);
}
does not match the hexadecimal representation of the non-ascii character. How do i get it to match the \u0000-\u007F in the string ?
Any help is appreciated. Thank you!
You can use
var result = Regex.Replace(@"\u007FTestString\U0000", @"\\[uU]00[0-7][0-9A-Fa-f]", "");
The @"..."
verbatim string literal syntax is required to make all backslashes literal characters that do not form any string escape sequences.
Pattern details:
\\
- a backslash[uU]
- u
or U
00
- two zeros[0-7]
- a digit from zero to seven[0-9A-Fa-f]
- an ASCII hex digit char.