Search code examples
c#jsonjson.netstring-interning

string Intern on serializer.Deserialize<T>()


I am currently using json.net to deserialise a string that is mid size collection of objects. ~7000 items in total.

Each item has a recurring group of 4 identical strings, on memory profiling this creates about 40,000 references depending on nesting etc..

Is there a way to get the serializer to use the same reference for each identical string?

Example Json:

  [{
    "name":"jon bones",
    "groups":[{
        "groupName":"Region",
        "code":"1"
    },{
        "groupName":"Class",
        "code":"4"
    }]
},
{
    "name":"Swan moans",
    "groups":[{
        "groupName":"Region",
        "code":"12"
    },{
        "groupName":"Class",
        "code":"1"
    }]
}]

Added example. as you can seen the groupName values repeat on almost all objects. just the relevant codes change. It's not such a great concern but as the dataset grows i would rather not increase allocations too much.

also it might seem like the "code" may repeat , but it is unique for each person. basically multiple identifiers for the same object.


Solution

  • If you know your 4 standard strings in advance, you can intern them with String.Intern() (or just declare them as string literals somewhere -- that does the job) then use the following custom JsonConverter to convert all JSON string literals to their interned value if one is found:

    public class InternedStringConverter : JsonConverter
    {
        public override bool CanConvert(Type objectType)
        {
            return objectType == typeof(string);
        }
    
        public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
        {
            if (reader.TokenType == JsonToken.Null)
                return null;
            var s = reader.TokenType == JsonToken.String ? (string)reader.Value : (string)JToken.Load(reader); // Check is in case the value is a non-string literal such as an integer.
            return String.IsInterned(s) ?? s;
        }
    
        public override bool CanWrite { get { return false; } }
    
        public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
        {
            throw new NotImplementedException();
        }
    }
    

    This can be applied globally via serializer settings:

            var settings = new JsonSerializerSettings { Converters = new [] { new InternedStringConverter() } };
            var root = JsonConvert.DeserializeObject<RootObject>(jsonString, settings);
    

    You can also apply it to the specific string collection using JsonPropertyAttribute.ItemConverterType:

    public class Group
    {
        [JsonProperty(ItemConverterType = typeof(InternedStringConverter))]
        public List<string> StandardStrings { get; set; }
    }
    

    If you don't know the 4 strings in advance, you can create a converter that interns the strings as they are read:

    public class AutoInterningStringConverter : JsonConverter
    {
        public override bool CanConvert(Type objectType)
        {
            // CanConvert is not called when a converter is applied directly to a property.
            throw new NotImplementedException("AutoInterningStringConverter should not be used globally");
        }
    
        public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
        {
            if (reader.TokenType == JsonToken.Null)
                return null;
            var s = reader.TokenType == JsonToken.String ? (string)reader.Value : (string)JToken.Load(reader); // Check is in case the value is a non-string literal such as an integer.
            return String.Intern(s);
        }
    
        public override bool CanWrite { get { return false; } }
    
        public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
        {
            throw new NotImplementedException();
        }
    }
    

    However, I strongly recommend against using this globally as you could end up adding enormous numbers of strings to the internal string table. Instead, only apply it to the specific string collection(s) that you are confident contain duplicates of small numbers of unique strings:

    public class Group
    {
        [JsonProperty(ItemConverterType = typeof(AutoInterningStringConverter))]
        public List<string> StandardStrings { get; set; }
    }
    

    Update

    From your updated question, I see you have string properties with standard values, rather than a collection of strings with standard values. Thus you would use [JsonConverter(typeof(AutoInterningStringConverter))] on each:

    public class Group
    {
        [JsonConverter(typeof(AutoInterningStringConverter))]
        public string groupName { get; set; }
    
        public string code { get; set; }
    }