Search code examples
c#c#-4.0datacontractserializerbinaryformatter

DataContractSerializer vs BinaryFormatter performance


I was going through articles to understand more about the datacontractserializer and binaryformatter serializers. Based on the reading done so far I was under the impression that binaryformatter should have a lesser footprint than datacontractserializer. Reason being DataContractSerializer serializes to xml infoset while binaryformatter serializes to a proprietary binary format.

Following is the test

    [Serializable]
    [DataContract]
    public class Packet
    {
        [DataMember]
        public DataSet Data { get; set; }
        [DataMember]
        public string Name { get; set; }
        [DataMember]
        public string Description { get; set; }
    }

DataSet was populated with 121317 rows from [AdventureWorks].[Sales].[SalesOrderDetail] table

    using (var fs = new FileStream("test1.txt", FileMode.Create))
    {
        var dcs = new DataContractSerializer(typeof(Packet));
        dcs.WriteObject(fs, packet);
        Console.WriteLine("Total bytes with dcs = " + fs.Length);
    }



    using(var fs = new FileStream("test2.txt", FileMode.Create))
    {
       var bf = new BinaryFormatter();
       bf.Serialize(fs, packet);
       Console.WriteLine("Total bytes with binaryformatter = " + fs.Length);
    }


Results
Total bytes with dcs = 57133023
Total bytes with binaryformatter = 57133984

Question Why is the byte count for binaryformatter more than datacontractserializer? Shouldn't it be much lesser?


Solution

  • DataSet has a bad habit: it implements ISerializable and then serializes its contents as a string of XML by default, even when passed to a BinaryFormatter. This is why the two streams are nearly identical in size. If you change its RemotingFormat property to Binary, it will do the same thing but by creating a new BinaryFormatter, dumping itself into a MemoryStream, and then putting the resulting byte array as a value in the outer BinaryFormatter's stream.

    Outside of that, BinaryFormatter carries more information about types, such as the full name of the assembly they came from; also, there is the per-object overhead on top of the XML for a DataSet.

    If you're trying to compare the behavior of the two serializers, DataSet is a poor choice because it overrides too much.