I'm trying to unescape XML entities in a string in .NET (C#), but I don't seem to get it to work correctly.
For example, if I have the string AT&T
, it should be translated to AT&T
.
One way is to use HttpUtility.HtmlDecode(), but that's for HTML.
So I have two questions about this:
Is it safe to use HttpUtility.HtmlDecode() for decoding XML entities?
How do I use XmlReader (or something similar) to do this? I have tried the following, but that always returns an empty string:
static string ReplaceEscapes(string text)
{
StringReader reader = new StringReader(text);
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlReader xmlReader = XmlReader.Create(reader, settings))
{
return xmlReader.ReadString();
}
}
Your #2 solution can work, but you need to call xmlReader.Read();
(or xmlReader.MoveToContent();
) prior to ReadString
.
I guess #1 would be also acceptable, even though there are those edge cases like ®
which is a valid HTML entity, but not an XML entity – what should your unescaper do with it? Throw an exception as a proper XML parser, or just return “®” as the HTML parser would do?