Search code examples
c#serializationdeserializationbinary-deserialization

Force binary deserialization to fail when type modified


I'm looking for a non-intrusive way to enforce deserialization to fail under the following circumstances:

  • The type is not defined in a strongly named assembly.
  • BinaryFormatter is used.
  • Since serialized, the type has been modified (e.g. a property has been added).

Below is an illustration/repro of the problem in form of a failing NUnit test. I'm looking for a generic way to make this pass without modifying the Data class, preferably by just setting up the BinaryFormatter during serialization and/or deserialization. I also don't want to involve serialization surrogates, as this is likely to require specific knowledge for each affected type.

Can't find anything in the MSDN docs that helps me though.

[Serializable]
public class Data
{
  public string S { get; set; }
}

public class DataSerializationTests
{
    /// <summary>
    /// This string contains a Base64 encoded serialized instance of the
    /// original version of the Data class with no members:
    /// [Serializable]
    /// public class Data
    /// { }
    /// </summary>
    private const string Base64EncodedEmptyDataVersion =
        "AAEAAAD/////AQAAAAAAAAAMAgAAAEtTc2MuU3Rvcm0uRGF0YS5UZXN0cywgV"+
        "mVyc2lvbj0xLjAuMC4wLCBDdWx0dXJlPW5ldXRyYWwsIFB1YmxpY0tleVRva2"+
        "VuPW51bGwFAQAAABlTc2MuU3Rvcm0uRGF0YS5UZXN0cy5EYXRhAAAAAAIAAAAL";

    [Test]
    public void Deserialize_FromOriginalEmptyVersionFails()
    {
        var binaryFormatter = new BinaryFormatter();
        var memoryStream = new MemoryStream(Convert.FromBase64String(Base64EncodedEmptyDataVersion));

        memoryStream.Seek(0L, SeekOrigin.Begin);

        Assert.That(
            () => binaryFormatter.Deserialize(memoryStream),
            Throws.Exception
        );
    }
}

Solution

  • I'd recommend a "Java" way here - declare int field in every single serializable class like private int _Serializable = 0; and check that your current version & serialized version match; manually increase when you change properties. If you insist on automated way you'll have to store a lot of metadata and check if current metadata & persisted metadata matches (extra burden on performance/size of serialized data).

    Here is the automatic descriptor. Basically you'll have to store TypeDescriptor instance as a part of your binary data & on retrieve check if persisted TypeDescriptor is valid for serialization (IsValidForSerialization) against current TypeDescriptor.

    var persistedDescriptor = ...;
    var currentDescriptor = Describe(typeof(Foo));
    bool isValid = persistedDescriptor.IsValidForSerialization(currentDescriptor);
    
    [Serializable]
    [DataContract]
    public class TypeDescriptor
    {
      [DataMember]
      public string TypeName { get; set; }
      [DataMember]
      public IList<FieldDescriptor> Fields { get; set; }
    
      public TypeDescriptor()
      {
        Fields = new List<FieldDescriptor>();
      }
    
      public bool IsValidForSerialization(TypeDescriptor currentType)
      {
        if (!string.Equals(TypeName, currentType.TypeName, StringComparison.Ordinal))
        {
          return false;
        }
        foreach(var field in Fields)
        {
          var mirrorField = currentType.Fields.FirstOrDefault(f => string.Equals(f.FieldName, field.FieldName, StringComparison.Ordinal));
          if (mirrorField == null)
          {
            return false;
          }
          if (!field.Type.IsValidForSerialization(mirrorField.Type))
          {
            return false;
          }
        }
        return true;
      }
    }
    
    [Serializable]
    [DataContract]
    public class FieldDescriptor
    {
      [DataMember]
      public TypeDescriptor Type { get; set; }
      [DataMember]
      public string FieldName { get; set; }
    }
    
    private static TypeDescriptor Describe(Type type, IDictionary<Type, TypeDescriptor> knownTypes)
    {
      if (knownTypes.ContainsKey(type))
      {
        return knownTypes[type];
      }
    
      var descriptor = new TypeDescriptor { TypeName = type.FullName, Fields = new List<FieldDescriptor>() };
      knownTypes.Add(type, descriptor);
      if (!type.IsPrimitive && type != typeof(string))
      {
        foreach (var field in type.GetFields(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public).OrderBy(f => f.Name))
        {
          var attributes = field.GetCustomAttributes(typeof(NonSerializedAttribute), false);
          if (attributes != null && attributes.Length > 0)
          {
            continue;
          }
    
          descriptor.Fields.Add(new FieldDescriptor { FieldName = field.Name, Type = Describe(field.FieldType, knownTypes) });
    
        }
      }
      return descriptor;
    }
    
    public static TypeDescriptor Describe(Type type)
    {
      return Describe(type, new Dictionary<Type, TypeDescriptor>());
    }    
    

    I also though about some mechanism of shortening size of persisted metadata - like calculating MD5 from xml-serialized or json-serialized TypeDescriptor; but in that case new property/field will mark your object as incompatible for serialization.