Search code examples
c#xmlreader

Deserializing XML to List<Object> - faster to combine into one xml file first?


I have a list of contracts that come in as xml strings like so:

<contract>
  <CONTRACTID>CN0425-3</CONTRACTID>
  <NAME>10425 - One-Year Contract</NAME>
  <WHENMODIFIED>02/01/2020 08:18:30</WHENMODIFIED>
</contract>

<contract>
  <CONTRACTID>CN0260-4</CONTRACTID>
  <NAME>10260 - One-Year Contract</NAME>
  <WHENMODIFIED>02/01/2020 08:18:30</WHENMODIFIED>
</contract>

I'm using this function to deserialize each item to an object:

 public static T ParseXML<T>(this string @this) where T : class
 {
      var reader = XmlReader.Create(@this.Trim().ToStream(), new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Document });
      var xmlRoot = new XmlRootAttribute { ElementName = typeof(T).Name.ToLower(), IsNullable = true };
      return new XmlSerializer(typeof(T), xmlRoot).Deserialize(reader) as T;
 }

Calling it like so:

// list is of type List<XElement> which contains a list of contracts
contracts.AddRange(from object e in list select e.ToString().ParseXML<Contract>() into e
                    select new Contract { Key = e.CONTRACTID, Name = e.NAME });

And here is my Contract class:

[SerializableAttribute]
[DesignerCategoryAttribute("code")]
[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = false)]
public class Contract
{
    public string CONTRACTID { get; set; }
    public string NAME { get; set; }
    public string WHENMODIFIED { get; set; }
}

The problem is when I have a large list (1000+ contracts), the deserialization process is slow because it has to go through each xml item. I am wondering if it would optimize performance to combine all the xml items into one file and then deserialize the whole thing to a list of objects. I could potentially combine the list of xml items like this:

<contracts>
    <contract>
      <CONTRACTID>CN0425-3</CONTRACTID>
      <NAME>10425 - One-Year Contract</NAME>
      <WHENMODIFIED>02/01/2020 08:18:30</WHENMODIFIED>
    </contract>
    <contract>
      <CONTRACTID>CN0260-4</CONTRACTID>
      <NAME>10260 - One-Year Contract</NAME>
      <WHENMODIFIED>02/01/2020 08:18:30</WHENMODIFIED>
    </contract>
 </contracts>

Do you guys know if that would benefit performance? And if so, how to combine the list of xml items and deserialize it?


Solution

  • Serialization is slow. Do comparison and see if xml linq below is faster :

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using System.Xml.Linq;
    
    
    namespace ConsoleApplication1
    {
        class Program
        {
            const string FILENAME = @"c:\temp\test.xml";
            static void Main(string[] args)
            {
                XDocument doc = XDocument.Load(FILENAME);
    
                var contracts = doc.Descendants("contract").Select(x => new Contract()
                {
                    CONTRACTID = (string)x.Element("CONTRACTID"),
                    NAME = (string)x.Element("NAME"),
                    WHENMODIFIED = (DateTime)x.Element("WHENMODIFIED")
                });
    
            }
        }
        public class Contract
        {
            public string CONTRACTID { get; set; }
            public string NAME { get; set; }
            public DateTime WHENMODIFIED { get; set; }
        }
    }