Search code examples
c#xmlxml-deserialization

Deserialize Comments using Custom IXmlSerializer


I am attempting to serialize my Description property to an Xml comment. So, to do this I have implemented IXmlSerializable and the following WriteXml produces very nice XML.

[Serializable]
public sealed class Setting<T> : SettingBase, IXmlSerializable
{
    public Setting() { }

    public Setting(T value, string description)
    {
        Value = value;
        Description = description;
    }

    public Setting(string command, T value, string description)
        : this(value, description)
    {
        Command = command;
    }

    public XmlSchema GetSchema()
    {
        return null;
    }

    public void ReadXml(XmlReader reader)
    {
    }

    public void WriteXml(XmlWriter writer)
    {
        var properties = GetType().GetProperties();
        foreach (var propertyInfo in properties)
        {
            if (propertyInfo.IsDefined(typeof(XmlCommentAttribute), false))
                writer.WriteComment(Description);
            else if (!propertyInfo.CustomAttributes.Any((attr) => attr.AttributeType.Equals(typeof(XmlIgnoreAttribute))))
                writer.WriteElementString(propertyInfo.Name, propertyInfo.GetValue(this, null)?.ToString());
        }
    }

    [XmlComment, Browsable(false)]
    public string Description { get; set; }

    [XmlElement, Browsable(false)]
    public string Command { get; set; }

    [XmlElement, Browsable(false)]
    public T Value { get; set; }

    [XmlIgnore]
    public override object ValueUntyped { get { return Value; } }
}

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
public class XmlCommentAttribute : Attribute {}

However, I have had many attempts at implementing the ReadXml but i cannot seem to be able to deserialize the Description comment.

How can I implement ReadXml to deserailize my class?


Solution

  • When implementing IXmlSerializable you need to adhere to the rules stated in this answer to Proper way to implement IXmlSerializable? by Marc Gravell as well as the documentation:

    For IXmlSerializable.WriteXml(XmlWriter):

    The WriteXml implementation you provide should write out the XML representation of the object. The framework writes a wrapper element and positions the XML writer after its start. Your implementation may write its contents, including child elements. The framework then closes the wrapper element.

    For IXmlSerializable.ReadXml(XmlReader):

    The ReadXml method must reconstitute your object using the information that was written by the WriteXml method.

    When this method is called, the reader is positioned on the start tag that wraps the information for your type. That is, directly on the start tag that indicates the beginning of a serialized object. When this method returns, it must have read the entire element from beginning to end, including all of its contents. Unlike the WriteXml method, the framework does not handle the wrapper element automatically. Your implementation must do so. Failing to observe these positioning rules may cause code to generate unexpected runtime exceptions or corrupt data.

    It turns out to be very tricky to write a ReadXml() that correctly handles edge cases such as out-of-order or unexpected elements, missing or extra whitespace, empty elements, and so on. As such it makes sense to adopt some sort of parsing framework to iterate through the XML tree correctly, such as this one from Why does XmlSerializer throws an Exception and raise a ValidationEvent when a schema validation error occurs inside IXmlSerializable.ReadXml(), and extend it to handle comment nodes:

    public static class XmlSerializationExtensions
    {
        // Adapted from this answer https://stackoverflow.com/a/60498500/3744182
        // To https://stackoverflow.com/questions/60449088/why-does-xmlserializer-throws-an-exception-and-raise-a-validationevent-when-a-sc
        // by handling comments.
        public static void ReadIXmlSerializable(XmlReader reader, Func<XmlReader, bool> handleXmlAttribute, Func<XmlReader, bool> handleXmlElement, Func<XmlReader, bool> handleXmlText, Func<XmlReader, bool> handleXmlComment)
        {
            //https://learn.microsoft.com/en-us/dotnet/api/system.xml.serialization.ixmlserializable.readxml?view=netframework-4.8#remarks
            //When this method is called, the reader is positioned on the start tag that wraps the information for your type. 
            //That is, directly on the start tag that indicates the beginning of a serialized object. 
            //When this method returns, it must have read the entire element from beginning to end, including all of its contents. 
            //Unlike the WriteXml method, the framework does not handle the wrapper element automatically. Your implementation must do so. 
            //Failing to observe these positioning rules may cause code to generate unexpected runtime exceptions or corrupt data.
            reader.MoveToContent();
            if (reader.NodeType != XmlNodeType.Element)
                throw new XmlException(string.Format("Invalid NodeType {0}", reader.NodeType));
            if (reader.HasAttributes)
            {
                for (int i = 0; i < reader.AttributeCount; i++)
                {
                    reader.MoveToAttribute(i);
                    handleXmlAttribute(reader);
                }
                reader.MoveToElement(); // Moves the reader back to the element node.
            }
            if (reader.IsEmptyElement)
            {
                reader.Read();
                return;
            }
            reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.
            while (reader.NodeType != XmlNodeType.EndElement)
            {
                if (reader.NodeType == XmlNodeType.Element)
                {
                    using (var subReader = reader.ReadSubtree())
                    {
                        subReader.MoveToContent();
                        handleXmlElement(subReader);
                    }
                    // ReadSubtree() leaves the reader positioned ON the end of the element, so read that also.
                    reader.Read();
                }
                else if (reader.NodeType == XmlNodeType.Text || reader.NodeType == XmlNodeType.CDATA)
                {
                    var type = reader.NodeType;
                    handleXmlText(reader);
                    // Ensure that the reader was not advanced.
                    if (reader.NodeType != type)
                        throw new XmlException(string.Format("handleXmlText incorrectly advanced the reader to a new node {0}", reader.NodeType));
                    reader.Read();
                }
                else if (reader.NodeType == XmlNodeType.Comment)
                {
                    var type = reader.NodeType;
                    handleXmlComment(reader);
                    // Ensure that the reader was not advanced.
                    if (reader.NodeType != type)
                        throw new XmlException(string.Format("handleXmlComment incorrectly advanced the reader to a new node {0}", reader.NodeType));
                    reader.Read();
                }
                else // Whitespace, etc.
                {
                    // Skip() leaves the reader positioned AFTER the end of the node.
                    reader.Skip();
                }
            }
            // Move past the end of the wrapper element
            reader.ReadEndElement();
        }
    
        public static void ReadIXmlSerializable(XmlReader reader, Func<XmlReader, bool> handleXmlAttribute, Func<XmlReader, bool> handleXmlElement, Func<XmlReader, bool> handleXmlText)
        {
            ReadIXmlSerializable(reader, handleXmlAttribute, handleXmlElement, handleXmlText, r => true);
        }
    
        public static void WriteIXmlSerializable(XmlWriter writer, Action<XmlWriter> writeAttributes, Action<XmlWriter> writeNodes)
        {
            //https://learn.microsoft.com/en-us/dotnet/api/system.xml.serialization.ixmlserializable.writexml?view=netframework-4.8#remarks
            //The WriteXml implementation you provide should write out the XML representation of the object. 
            //The framework writes a wrapper element and positions the XML writer after its start. Your implementation may write its contents, including child elements. 
            //The framework then closes the wrapper element.
            writeAttributes(writer);
            writeNodes(writer);
        }
    }
    
    public static class XmlSerializerFactory
    {
        // To avoid a memory leak the serializer must be cached.
        // https://stackoverflow.com/questions/23897145/memory-leak-using-streamreader-and-xmlserializer
        // This factory taken from 
        // https://stackoverflow.com/questions/34128757/wrap-properties-with-cdata-section-xml-serialization-c-sharp/34138648#34138648
    
        readonly static Dictionary<Tuple<Type, string, string>, XmlSerializer> cache;
        readonly static object padlock;
    
        static XmlSerializerFactory()
        {
            padlock = new object();
            cache = new Dictionary<Tuple<Type, string, string>, XmlSerializer>();
        }
    
        public static XmlSerializer Create(Type serializedType, string rootName, string rootNamespace)
        {
            if (serializedType == null)
                throw new ArgumentNullException();
            if (rootName == null && rootNamespace == null)
                return new XmlSerializer(serializedType);
            lock (padlock)
            {
                XmlSerializer serializer;
                var key = Tuple.Create(serializedType, rootName, rootNamespace);
                if (!cache.TryGetValue(key, out serializer))
                    cache[key] = serializer = new XmlSerializer(serializedType, new XmlRootAttribute { ElementName = rootName, Namespace = rootNamespace });
                return serializer;
            }
        }
    }
    

    Then modify your class to use it as follows:

    [Serializable]
    public sealed class Setting<T> : SettingBase, IXmlSerializable
    {
        public Setting() { }
    
        public Setting(T value, string description)
        {
            Value = value;
            Description = description;
        }
    
        public Setting(string command, T value, string description)
            : this(value, description)
        {
            Command = command;
        }
    
        public XmlSchema GetSchema() { return null;}
    
        public void ReadXml(XmlReader reader)
        {
            XmlSerializationExtensions.ReadIXmlSerializable(reader, r => true,
                r =>
                {
                    switch (r.LocalName)
                    {
                        case "Command":
                            Command = r.ReadElementContentAsString();
                            break;
                        case "Value":
                            var serializer = XmlSerializerFactory.Create(typeof(T), "Value", reader.NamespaceURI);
                            Value = (T)serializer.Deserialize(r);
                            break;
                    }
                    return true;
                },
                r => true, r => { Description += r.Value; return true; });
        }
    
        public void WriteXml(XmlWriter writer)
        {
            XmlSerializationExtensions.WriteIXmlSerializable(writer, w => { },
                w =>
                {
                    if (Description != null)
                        w.WriteComment(Description);
                    if (Command != null)
                        w.WriteElementString("Command", Command);
                    if (Value != null)
                    {
                        var serializer = XmlSerializerFactory.Create(typeof(T), "Value", null);
                        serializer.Serialize(w, Value);
                    }
                });
        }
    
        public string Description { get; set; }
    
        public string Command { get; set; }
    
        public T Value { get; set; }
    
        public override object ValueUntyped { get { return Value; } }
    }
    
    // ABSTRACT BASE CLASS NOT INCLUDED IN QUESTION, THIS IS JUST A GUESS
    [Serializable]
    public abstract class SettingBase
    {
        public abstract object ValueUntyped { get; }
    }
    

    And you will be able to round-trip it to XML.

    Notes:

    • Since your class is sealed I replaced use of reflection with direct access to properties to serialize.

    • In your version you serialize your T Value to XML by writing its ToString() value:

      writer.WriteElementString(propertyInfo.Name, propertyInfo.GetValue(this, null)?.ToString());
      

      Unless the value is, itself, a string, this is likely to produce a wrong result:

      • Numeric, DateTime, TimeSpan and similar primitives will be localized. XML primitives should always be formatted in a culturally invariant manner.

      • Complex objects such as string [] that do not override ToString() will be formatted in a completely incorrect manner.


      To avoid these problems my version serializes the value to XML by constructing an appropriate XmlSerializer. This guarantees correctness but may be slower than your version. If performance matters here you could check for known types (such as string) and format them to XML manually, using e.g. the utility class XmlConvert.

    • XmlReader.ReadSubtree() is used to ensure that the XmlReader is not mispositioned by HandleXmlElement(XmlReader reader).

    Demo fiddle here.