I'm trying to filter invalid characters from an XML file, and have the following test project;
class Program
{
private static Regex _invalidXMLChars = new Regex(@"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]", RegexOptions.Compiled);
static void Main(string[] args)
{
var text = "assdabv";
Console.WriteLine(_invalidXMLChars.IsMatch(text));
}
}
This test project outputs the expected result (True) with .NET fiddle;
But when I try to implement the same code in my project, the invalid characters are not found and outputs "False".
How come this works in .NET fiddle, but not in my project?
Altering the source XML file is not an option
Visual Studio is right. None of the characters &
, #
, x
, F
or ;
are part of your Regex. However, in HTML 
translates to the C# pendant \u000f
which then is replaced due to the Regex definition \0xE-\0x1F
.
Using \u000f
in Visual Studio gives a match:
using System;
using System.Text.RegularExpressions;
public class Program
{
private static Regex _invalidXMLChars = new Regex(@"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]", RegexOptions.Compiled);
public static void Main()
{
var text = "assd\u000fabv";
Console.WriteLine(_invalidXMLChars.IsMatch(text));
}
}