Search code examples
.netxmlentitiestranslate

Unescaping XML entities using XmlReader in .NET?


I'm trying to unescape XML entities in a string in .NET (C#), but I don't seem to get it to work correctly.

For example, if I have the string AT&T, it should be translated to AT&T.

One way is to use HttpUtility.HtmlDecode(), but that's for HTML.

So I have two questions about this:

  1. Is it safe to use HttpUtility.HtmlDecode() for decoding XML entities?

  2. How do I use XmlReader (or something similar) to do this? I have tried the following, but that always returns an empty string:

    static string ReplaceEscapes(string text)
    {
        StringReader reader = new StringReader(text);
    
        XmlReaderSettings settings = new XmlReaderSettings();
    
        settings.ConformanceLevel = ConformanceLevel.Fragment;
    
        using (XmlReader xmlReader = XmlReader.Create(reader, settings))
        {
            return xmlReader.ReadString();
        }
    }
    

Solution

  • Your #2 solution can work, but you need to call xmlReader.Read(); (or xmlReader.MoveToContent();) prior to ReadString.

    I guess #1 would be also acceptable, even though there are those edge cases like ® which is a valid HTML entity, but not an XML entity – what should your unescaper do with it? Throw an exception as a proper XML parser, or just return “®” as the HTML parser would do?