Search code examples
c#.netxmldtdxmlserializer

XML to object using XML and DTD


I have an XML file:

<?xml version="1.0"?>
<!DOCTYPE report SYSTEM "01.dtd" [
    <!ENTITY parameter "blablabla"> 
]>

<report xmlns="http://tempuri.org/report"
  details="Something is described &parameter;"
></report>

I have tried to parse this XML into an object, but after deserializing in the details property I get this result : "Something is described &parameter;"

But I would like to get this result: "Something is described blablabla".

My code is the following:

class Program
{
    static void Main(string[] args)
    {
        ReadXMLwithDTD();
    }

    public static void ReadXMLwithDTD()
    {
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.XmlResolver = new XmlUrlResolver();
        settings.ValidationType = ValidationType.DTD;
        settings.DtdProcessing = DtdProcessing.Parse;
        settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
        settings.IgnoreWhitespace = true;

        var files = Directory.GetFiles("../../../App_data/include/", "01.xml", SearchOption.AllDirectories);

        foreach (var file in files)
        {
            XmlDocument xmlDoc = new XmlDocument();

            using (StringReader sr = new StringReader(file))
            using (XmlReader reader = XmlReader.Create(sr, settings))
            {
                xmlDoc.Load(file);
            }

            report r = DeserializeToObject<report>(xmlDoc.OuterXml);
        }

        Console.ReadLine();
    }

    public static T DeserializeToObject<T>(string xml) where T : class
    {
        System.Xml.Serialization.XmlSerializer ser = new System.Xml.Serialization.XmlSerializer(typeof(T));

        MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));

        return (T)ser.Deserialize(memStream);
    }

    private static void ValidationCallBack(object sender, ValidationEventArgs e)
    {
        if (e.Severity == XmlSeverityType.Warning)
            Console.WriteLine("Warning: Matching schema not found.  No validation occurred." + e.Message);
        else // Error
            Console.WriteLine("Validation error: " + e.Message);
    }
}

What should I change?


Solution

  • There is no need to load your XML into an intermediate XmlDocument. You can expand entities during deserialization by XmlSerializer as long as you pass the serializer an XmlReader configured with DtdProcessing.Parse.

    I.e. if I generalize your deserialization code a little bit as follows:

    public static partial class XmlSerializationHelper
    {
        public static T LoadFromXmlWithDTD<T>(string filename, XmlSerializer serial = default, ValidationEventHandler validationCallBack = default)
        {
            var settings = new XmlReaderSettings
            {
                // This will throw an exception if uncommented:
                //   System.Xml.XmlException: An error has occurred while opening external DTD 'file:///app/01.dtd': Could not find file '/app/01.dtd'
                // XmlResolver = new XmlUrlResolver(), 
                DtdProcessing = DtdProcessing.Parse,
                IgnoreWhitespace = true,
            };
            settings.ValidationEventHandler += validationCallBack;
            serial = serial ?? new XmlSerializer(typeof(T));
            using (var reader = XmlReader.Create(filename, settings))
                return (T)serial.Deserialize(reader);
        }
    }
    

    You can call it as follows:

    var report = XmlSerializationHelper.LoadFromXmlWithDTD<report>(filename, validationCallBack: ValidationCallBack);
    

    And Details will be correctly expanded:

    Assert.AreEqual("Something is described blablabla", report.Details);
    

    Notes:

    • You might want to set XmlReaderSettings.MaxCharactersFromEntities:

      This property allows you to mitigate denial of service attacks where the attacker submits XML documents that attempt to exceed memory limits via expanding entities. By limiting the characters that result from expanded entities, you can detect the attack and recover reliably.

    • In the following code:

      using (StringReader sr = new StringReader(file)) 
      using (XmlReader reader = XmlReader.Create(sr, settings))
      {
          xmlDoc.Load(file);
      }
      

      You create an XmlReader that uses a StringReader to parse the file name file as if it were an XML string instead of a filename string -- then you ignore the reader you created and load the file contents directly by name using xmlDoc.Load(file);. This would seem to ignore the settings you just constructed and may be the immediate cause of your bug.

    • Uncommenting XmlResolver = new XmlUrlResolver() will cause an exception Could not find file '/app/01.dtd' to be thrown if the specified external DTD file (which is not included in your question) is not found.

    Demo fiddle here.