Search code examples
c#json.netdatacontractserializer

Crash after serializing JObject with DataContractSerializer and XmlDictionaryWriter


I have to serialize an Newtonsoft JObject with DataContractSerializer, it crashes with stack overflow. How to make it work? My code is.

var serializer = new DataContractSerializer(typeof(JObject));
MemoryStream stream1 = new MemoryStream();
var writer = XmlDictionaryWriter.CreateBinaryWriter(stream1);
var obj = new JObject();
serializer.WriteObject(writer, obj);
writer.Flush();

The following example is converting JObject to common type using ISerializationSurrogateProvider functionality. It will crash with stack overflow.


using System;
using System.IO;
using Newtonsoft.Json.Linq;
using System.Runtime.Serialization;
using System.Xml;

class Program
{
    [DataContract(Name = "JTokenReference", Namespace = "urn:actors")]
    [Serializable]
    public sealed class JTokenReference
    {
        public JTokenReference()
        {
        }

        [DataMember(Name = "JType", Order = 0, IsRequired = true)]
        public JTokenType JType { get; set; }

        [DataMember(Name = "Value", Order = 1, IsRequired = true)]
        public string Value { get; set; }

        public static JTokenReference From(JToken jt)
        {
            if (jt == null)
            {
                return null;
            }
            return new JTokenReference()
            {
                Value = jt.ToString(),
                JType = jt.Type
            };
        }
        public object To()
        {
            switch (JType)
            {
                case JTokenType.Object:
                    {
                        return JObject.Parse(Value);
                    }
                case JTokenType.Array:
                    {
                        return JArray.Parse(Value);
                    }
                default:
                    {
                        return JToken.Parse(Value);
                    }
            }
        }
    }

    internal class ActorDataContractSurrogate : ISerializationSurrogateProvider
    {
        public static readonly ISerializationSurrogateProvider Instance = new ActorDataContractSurrogate();

        public Type GetSurrogateType(Type type)
        {
            if (typeof(JToken).IsAssignableFrom(type))
            {
                return typeof(JTokenReference);
            }

            return type;
        }

        public object GetObjectToSerialize(object obj, Type targetType)
        {
            if (obj == null)
            {
                return null;
            }
            else if (obj is JToken jt)
            {
                return JTokenReference.From(jt);
            }

            return obj;
        }

        public object GetDeserializedObject(object obj, Type targetType)
        {
            if (obj == null)
            {
                return null;
            }
            else if (obj is JTokenReference reference &&
                    typeof(JToken).IsAssignableFrom(targetType))
            {
                return reference.To();
            }
            return obj;
        }
    }

    [DataContract(Name = "Test", Namespace = "urn:actors")]
    [Serializable]
    public class Test
    {
        [DataMember(Name = "obj", Order = 0, IsRequired = false)]
        public JObject obj;
    }

    static void Main(string[] args)
    {
        var serializer = new DataContractSerializer(typeof(Test),
        new DataContractSerializerSettings()
        {
            MaxItemsInObjectGraph = int.MaxValue,
            KnownTypes = new Type[] { typeof(JTokenReference), typeof(JObject), typeof(JToken) },
        });

        serializer.SetSerializationSurrogateProvider(ActorDataContractSurrogate.Instance);

        MemoryStream stream1 = new MemoryStream();
        var writer = XmlDictionaryWriter.CreateBinaryWriter(stream1);
        var obj = new JObject();
        var test = new Test()
        {
            obj = obj,
        };
        serializer.WriteObject(writer, test);
        writer.Flush();
        Console.WriteLine(System.Text.Encoding.UTF8.GetString(stream1.GetBuffer(), 0, checked((int)stream1.Length)));
    }
}

I am trying to define a new type JTokenReference to replace JObject/JToken when serializing, but it crashed before replace happens. It seems it failed to resolve the type.


Solution

  • TL;DR

    Your approach is reasonable, and ought to work, but fails due to what seems to be a bug in the ISerializationSurrogateProvider functionality with recursive collection types. You're going to need to change your design to use surrogate properties whenever you need to serialize a JToken, e.g. as follows:

    [IgnoreDataMember]
    public JObject obj { get; set; }
    
    [DataMember(Name = "obj", Order = 0, IsRequired = false)]
    string objSurrogate { get { return obj?.ToString(Newtonsoft.Json.Formatting.None); } set { obj = (value == null ? null : JObject.Parse(value)); } }
    

    Explanation

    The crash you are experiencing is a stack overflow, and can be reproduced more simply as follows. When the data contract serializer writes a generic such as List<string>, it constructs a data contract name by combining the generic class and parameter names like so:

    • List<string>: ArrayOfstring
    • List<List<string>: ArrayOfArrayOfstring
    • List<List<List<string>>>: ArrayOfArrayOfArrayOfstring

    And so on. As the generic nesting gets deeper the name gets longer. Well then, what happens if we define a self-recursive collection type like the following?

    public class RecursiveList<T> : List<RecursiveList<T>>
    {
    }
    

    Well, if we try to serialize one of these list with the data contract serializer, it crashes with a stack overflow exception trying to figure out the contract name. Demo fiddle #1 here -- you will need to uncomment the line //Test(new RecursiveList<string>()); to see the crash:

    Stack overflow.
       at System.ModuleHandle.ResolveType(System.Runtime.CompilerServices.QCallModule, Int32, IntPtr*, Int32, IntPtr*, Int32, System.Runtime.CompilerServices.ObjectHandleOnStack)
       at System.ModuleHandle.ResolveTypeHandleInternal(System.Reflection.RuntimeModule, Int32, System.RuntimeTypeHandle[], System.RuntimeTypeHandle[])
       at System.Reflection.RuntimeModule.ResolveType(Int32, System.Type[], System.Type[])
       at System.Reflection.CustomAttribute.FilterCustomAttributeRecord(System.Reflection.MetadataToken, System.Reflection.MetadataImport ByRef, System.Reflection.RuntimeModule, System.Reflection.MetadataToken, System.RuntimeType, Boolean, ListBuilder`1<System.Object> ByRef, System.RuntimeType ByRef, System.IRuntimeMethodInfo ByRef, Boolean ByRef)
       at System.Reflection.CustomAttribute.IsCustomAttributeDefined(System.Reflection.RuntimeModule, Int32, System.RuntimeType, Int32, Boolean)
       at System.Reflection.CustomAttribute.IsDefined(System.RuntimeType, System.RuntimeType, Boolean)
       at System.Runtime.Serialization.CollectionDataContract.IsCollectionOrTryCreate(System.Type, Boolean, System.Runtime.Serialization.DataContract ByRef, System.Type ByRef, Boolean)
       at System.Runtime.Serialization.CollectionDataContract.IsCollectionHelper(System.Type, System.Type ByRef, Boolean)
       at System.Runtime.Serialization.DataContract.GetNonDCTypeStableName(System.Type)
       at System.Runtime.Serialization.DataContract.GetStableName(System.Type, Boolean ByRef)
       at System.Runtime.Serialization.DataContract.GetCollectionStableName(System.Type, System.Type, System.Runtime.Serialization.CollectionDataContractAttribute ByRef)
       at System.Runtime.Serialization.DataContract.GetNonDCTypeStableName(System.Type)
       at System.Runtime.Serialization.DataContract.GetStableName(System.Type, Boolean ByRef)
       at System.Runtime.Serialization.DataContract.GetCollectionStableName(System.Type, System.Type, System.Runtime.Serialization.CollectionDataContractAttribute ByRef)
       at System.Runtime.Serialization.DataContract.GetNonDCTypeStableName(System.Type)
       at System.Runtime.Serialization.DataContract.GetStableName(System.Type, Boolean ByRef)
    

    Oops. Well, what if we create a serialization surrogate such as the following dummy surrogate for RecursiveList<string>

    public class RecursiveListStringSurrogate
    {
        // A dummy surrogate that serializes nothing, for testing purposes.
    }
    
    public class RecursiveListStringSurrogateSelector : ISerializationSurrogateProvider
    {
        public object GetDeserializedObject(object obj, Type targetType)
        {
            if (obj is RecursiveListStringSurrogate)
                return new RecursiveList<string>();
            return obj;
        }
    
        public object GetObjectToSerialize(object obj, Type targetType)
        {
            if (obj is RecursiveList<string>)
                return new RecursiveListStringSurrogate();
            return obj;
        }
    
        public Type GetSurrogateType(Type type) 
        {
            if (type == typeof(RecursiveList<string>))
                return typeof(RecursiveListStringSurrogate);
            return type;
        }
    }
    

    Using that surrogate, an empty new RecursiveList<string>() can indeed be serialized successfully, as

    <RecursiveListStringSurrogate xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/" />
    

    Demo fiddle #2 here.

    OK, now let's try using the surrogate when a RecursiveList<string> is embedded in a model such as:

    public class Model
    {
        public RecursiveList<string> List { get; set; }
    }
    

    Well when I try to serialize an instance of this model with an empty list, the crash comes back. Demo fiddle #3 here - you will need to uncomment the line //Test(new Model { List = new RecursiveList<string>() }); to see the crash.

    Oops again. It's not entirely clear why this fails. I can only speculate that, somewhere, Microsoft is keeping a dictionary mapping original data contract names to surrogate data contract names -- which causes a stack overflow simply generating a dictionary key.

    Now what does this have to do with JObject and your Test class? Well it turns out that JObject is another example of a recursive collection type. It implements IDictionary<string, JToken?> and JToken in turn implements IEnumerable<JToken> thereby triggering the same stack overflow we saw with the simple model containing a RecursiveList<string>.

    You might even want to report an issue to Microsoft about this (though I don't know whether they are fixing bugs with the data contract serializer any more.)

    Workaround

    To avoid this issue, you will need to modify your model(s) to use surrogate properties for JToken members as shown at the beginning of this answer:

    [DataContract(Name = "Test", Namespace = "urn:actors")]
    public class Test
    {
        [IgnoreDataMember]
        public JObject obj { get; set; }
        
        [DataMember(Name = "obj", Order = 0, IsRequired = false)]
        string objSurrogate { get { return obj?.ToString(Newtonsoft.Json.Formatting.None); } set { obj = (value == null ? null : JObject.Parse(value)); } }
    }
    

    Which can be serialized successfully as follows:

    var obj = new JObject();
    var test = new Test()
    {
        obj = obj,
    };
    
    var serializer = new DataContractSerializer(test.GetType());
    
    MemoryStream stream1 = new MemoryStream();
    var writer = XmlDictionaryWriter.CreateBinaryWriter(stream1);
    serializer.WriteObject(writer, test);
    writer.Flush();
    Console.WriteLine(System.Text.Encoding.UTF8.GetString(stream1.GetBuffer(), 0, checked((int)stream1.Length)));
    

    Notes:

    • If you need to serialize a JToken as the root object you can either wrap it in some container object, or use the ActorDataContractSurrogate from your question. As we have seen, the serialization functionality does seem to work for recursive collection types when they are the root object.

    • Since you are serializing to binary, for efficiency I suggest formatting the JObject with Formatting.None.

    • The surrogate property can be private as long as it is marked with [DataMember].

    Demo fiddle #4 here.