Search code examples
c#json.net

Deserializing nested dictionaries where keys (strings) are interned?


suppose I have an existing concrete class that I want to deserialize some json into:

public class Jeff
    {
        [JsonProperty("string_to_int")]
        public Dictionary<string, Dictionary<string, Dictionary<string, int>>> StringToInt;
    }

Let's say the string keys at all levels of this data structure are from a finite set of words, and I want to string.Intern() all of them. Is there an easy way to intern all the string keys directly in my Deserialize<> call please?


Solution

  • You would like to use something like AutoInterningStringConverter from string Intern on serializer.Deserialize<T>() to automatically intern the dictionary keys as they are being deserialized, however Json.NET will not invoke a custom JsonConverter for dictionary keys[1].

    Thus you will need to create a custom JsonConverter for all Dictionary<string, TValue> types which automatically interns the keys. The following does this:

    public class AutoInterningDictionaryKeyConverter : JsonConverter
    {
        const int DefaultMaxToIntern = 64;
        const int MaxLengthToIntern = 2048; // Modify as required
        
        int maxToIntern;
    
        public AutoInterningDictionaryKeyConverter() : this(DefaultMaxToIntern) { }
        public AutoInterningDictionaryKeyConverter(int maxToIntern) => this.maxToIntern = maxToIntern;
    
        bool CanConvert(Type objectType, [System.Diagnostics.CodeAnalysis.NotNullWhen(returnValue: true)] out Type? valueType)
        {
            if (typeof(IDictionary).IsAssignableFrom(objectType)
                && (objectType.GetDictionaryKeyValueType() is var keyValueTypes && keyValueTypes is not null)
                && keyValueTypes[0] == typeof(string)
                && objectType.GetConstructor(Type.EmptyTypes) is not null)
            {
                valueType = keyValueTypes[1];
                return true;
            }
            valueType = null;
            return false;
        }
        
        public override bool CanConvert(Type objectType) => CanConvert(objectType, out var _);
    
        public override object? ReadJson(JsonReader reader, Type objectType, object? existingValue, JsonSerializer serializer)
        {
            if (reader.MoveToContentAndAssert().TokenType == JsonToken.Null)
                return null;
            if (reader.TokenType != JsonToken.StartObject)
                throw new JsonSerializationException(string.Format("Unexpected token {0}", reader.TokenType));
            // Here we take advantage of the fact that Dictionary<TKey, TValue> implements IDictionary.  
            // Recall that, in CanConvert, we checked to make sure the dictionary had a public parameterless constructor, so DefaultCreator won't be null
            var dictionary = existingValue as IDictionary ?? (IDictionary)(serializer.ContractResolver.ResolveContract(objectType).DefaultCreator!());
            var keyValueTypes = objectType.GetDictionaryKeyValueType().ThrowOnNull();
            while (reader.ReadToContentAndAssert().TokenType != JsonToken.EndObject)
            {
                var name = (string)reader.AssertTokenType(JsonToken.PropertyName).Value.ThrowOnNull();
                if (String.IsInterned(name) is var s && s is not null)
                    name = s;
                else if (name.Length <= MaxLengthToIntern)
                {
                    if (Interlocked.Decrement(ref maxToIntern) >= 0)
                        name = string.Intern(name);
                    else
                        Volatile.Write(ref maxToIntern, 0); // Don't let maxToIntern underflow int.MinValue (extremely unlikely but still not technically impossible.
                }
                
                dictionary.Add(name, serializer.Deserialize(reader.ReadToContentAndAssert(), keyValueTypes[1]));
            }
            return dictionary;
        }
    
        public override bool CanWrite => false;
        public override void WriteJson(JsonWriter writer, object? value, JsonSerializer serializer) => throw new NotImplementedException();
    }
    
    public static partial class JsonExtensions
    {
        public static JsonReader AssertTokenType(this JsonReader reader, JsonToken tokenType) => 
            reader.TokenType == tokenType ? reader : throw new JsonSerializationException(string.Format("Unexpected token {0}, expected {1}", reader.TokenType, tokenType));
        
        public static JsonReader ReadToContentAndAssert(this JsonReader reader) =>
            reader.ReadAndAssert().MoveToContentAndAssert();
    
        public static JsonReader MoveToContentAndAssert(this JsonReader reader)
        {
            if (reader == null)
                throw new ArgumentNullException();
            if (reader.TokenType == JsonToken.None)       // Skip past beginning of stream.
                reader.ReadAndAssert();
            while (reader.TokenType == JsonToken.Comment) // Skip past comments.
                reader.ReadAndAssert();
            return reader;
        }
    
        public static JsonReader ReadAndAssert(this JsonReader reader)
        {
            if (reader == null)
                throw new ArgumentNullException();
            if (!reader.Read())
                throw new JsonReaderException("Unexpected end of JSON stream.");
            return reader;
        }
    }
    
    public static class TypeExtensions
    {
        public static IEnumerable<Type> GetInterfacesAndSelf(this Type type)
            => (type ?? throw new ArgumentNullException()).IsInterface ? new[] { type }.Concat(type.GetInterfaces()) : type.GetInterfaces();
    
        public static IEnumerable<Type []> GetDictionaryKeyValueTypes(this Type type)
            => type.GetInterfacesAndSelf().Where(t => t.IsGenericType && t.GetGenericTypeDefinition() == typeof(IDictionary<,>)).Select(t => t.GetGenericArguments());
        
        public static Type []? GetDictionaryKeyValueType(this Type type)
        {
            var types = type.GetDictionaryKeyValueTypes().ToList();
            return types.Count == 1 ? types[0] : null;
        }
        
        public static T ThrowOnNull<T>(this T? value) where T : class => value ?? throw new ArgumentNullException();
    }
    

    Notes:

    • To prevent unexpected or malicious JSON from swamping your interned string pool and degrading the performance of your entire program, the converter will only add a limited number number of strings to the pool. It will also only add strings shorter than some specified length.

      You can modify the limits as required.

    • The converter works for read/write dictionaries that implement both IDictionary and IDictionary<string, TValue> for some TValue. It would need to be enhanced to support immutable dictionaries.

    Demo fiddle here.


    [1] For confirmation, see this comment by JamesNK to Dictionary conversion for complex key-types is really buggy, defaults to ToString() output #2440:

    JsonConverter isn't used for dictionary keys because they're strings, not JSON. That is why ToString or a TypeConverter is used.