Search code examples
c#jsondeserializationprotocol-bufferssystem.text.json

How can we deserialize to ProtoBuf from huge json using JsonSerializer.DeserializeAsyncEnumerable with different property name


I have huge json file so I used below code and actually it works.

using (FileStream? fileStream = new FileStream("hugefile.json", FileMode.Open))
{
    IAsyncEnumerable<Person?> people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream);
    await foreach (Person? person in people)
    {
        Console.WriteLine($"Hello, my name is {person.Name}!");
    }
}

My problem is in the Person class which is generated with protobuf. It contains a property which name is TrackingDatas and it has ProtoMember attribute you can see below. But in my huge json the property name is TrackingData. I want to deserialize it but without any add or delete from ProtoBuf class. Does anyone have any idea?

[global::ProtoBuf.ProtoMember(2, Name = @"TrackingData")]
public global::System.Collections.Generic.List<EntityTrackingActivity> TrackingDatas { get; } = new global::System.Collections.Generic.List<EntityTrackingActivity>();

I tried this code below to change name of property but it did not work for me.

public class CustomNamingPolicy : JsonNamingPolicy
{
    private readonly Dictionary<string, string> NameMapping = new Dictionary<string, string>()
    {
        [nameof(OASISLevel2TrackingPacket.EntityTracking.TrackingDatas)] = "TrackingData"
    };

    public override string ConvertName(string name)
    {
        var a = NameMapping.GetValueOrDefault(name, name);

        return a;
    }
}
var options = new JsonSerializerOptions()
                    {
                        PropertyNamingPolicy = new CustomNamingPolicy()
                    };
using (FileStream? fileStream = new FileStream("hugefile.json", FileMode.Open))
{
    IAsyncEnumerable<Person?> people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream, options);
    await foreach (Person? person in people)
    {
        Console.WriteLine($"Hello, my name is {person.Name}!");
    }
}

Solution

  • You have two problems here:

    1. The name of the property TrackingDatas does not match the JSON name "TrackingData", but your type is auto-generated by Protobuf so you cannot easily modify it.

      You have correctly fixed this by adding a PropertyNamingPolicy that remaps all properties named TrackingDatas (in all types) to "TrackingData".

    2. Your collection property

      public List<EntityTrackingActivity> TrackingDatas { get; } = new ();
      

      is read only, but System.Text.Json does not support deserializing read-only collection properties prior to .NET 8.

      For confirmation, see Can System.Text.Json.JsonSerializer serialize collections on a read-only property? and What's new in .NET 8: Read-only properties.

    So, what are your options for resolving the second problem?

    Firstly, you could deserialize to some appropriate PersonDTO then map the DTO to Person using, say, AutoMapper.

    Secondly in .NET 5 and later, if your automatically generated Person class was declared as partial, e.g.:

    [global::ProtoBuf.ProtoContract]
    public partial class EntityTracking
    {
        [global::ProtoBuf.ProtoMember(2, Name = @"TrackingData")]
        public global::System.Collections.Generic.List<EntityTrackingActivity> TrackingDatas { get; } = new global::System.Collections.Generic.List<EntityTrackingActivity>();      
    }
    
    [global::ProtoBuf.ProtoContract]
    public partial class Person : EntityTracking
    {
        [global::ProtoBuf.ProtoMember(1, Name = @"Name")]
        public string? Name { get; set; }
    }
    
    [global::ProtoBuf.ProtoContract]
    public partial class EntityTrackingActivity
    {
        [global::ProtoBuf.ProtoMember(1, Name = @"Id")]
        public int Id { get; set; }
    }
    

    You could add a parameterized constructor with an List<EntityTrackingActivity> trackingDatas argument and mark it with [JsonConstructor] like so:

    public partial class Person
    {
        public Person() { } // Add parameterless constructor if not already auto-generated by protobuf
        
        [JsonConstructor]
        public Person(List<EntityTrackingActivity> trackingDatas) => this.TrackingDatas.AddRange(trackingDatas ?? throw new ArgumentNullException(nameof(trackingDatas)));
    }
    

    And now you will be able to deserialize the TrackingDatas property.

    Demo fiddle #1 here.

    Thirdly, in .NET 7 and later, Microsoft has added the ability to programmatically customize the serialization contract that System.Text.Json creates for each .NET type. Using this API you can add a typeInfo modifier to map all JSON property names to the value of ProtoMemberAttribute.Name, and to add synthetic setters to get-only List<T> properties. This approach completely avoids the need to modify your types in any way.

    First, add the following extension methods:

    public static partial class JsonExtensions
    {
        public static Action<JsonTypeInfo> InitializeProtoMemberNames(Type type) => typeInfo => 
        {
            if (typeInfo.Kind != JsonTypeInfoKind.Object)
                return;
            if (!type.IsAssignableFrom(typeInfo.Type))
                return;
            // Fix property name(s).
            foreach (var property in typeInfo.Properties)
            {
                // Set the JSON property name to be the same as ProtoMemberAttribute.Name
                var name = property.AttributeProvider?.GetCustomAttributes(typeof(global::ProtoBuf.ProtoMemberAttribute), true)
                    .OfType<global::ProtoBuf.ProtoMemberAttribute>()
                    .FirstOrDefault()
                    ?.Name;
                if (name != null)
                    property.Name = name;
            }
        };
    
        public static Action<JsonTypeInfo> InitializeGetOnlyListSetters(Type type) => typeInfo => 
        {
            if (typeInfo.Kind != JsonTypeInfoKind.Object)
                return;
            if (!type.IsAssignableFrom(typeInfo.Type))
                return;
            // Add synthetic list setters.
            foreach (var property in typeInfo.Properties)
            {
                if (property.Get != null && property.Set == null && property.PropertyType.GetListItemType() is {} itemType)
                {
                    var method = typeof(JsonExtensions).GetMethod(nameof(JsonExtensions.CreateGetOnlyListPropertySetter),
                                                                  BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static)!;
                    var genericMethod = method.MakeGenericMethod(new[] { itemType });
                    var setter = genericMethod.Invoke(null, new object[] { property }) as Action<object, object?>;
                    property.Set = setter;
                }
            }
        };
        
        static Action<Object,Object?>? CreateGetOnlyListPropertySetter<TItem>(JsonPropertyInfo property)
        {
            if (property.Get == null)
                return null;
            (var getter, var name) = (property.Get, property.Name);
            return (obj, value) =>
            {
                var oldValue = (List<TItem>?)getter(obj);
                var newValue = value as List<TItem>;
                if (newValue == oldValue)
                    return;
                else if (oldValue == null)
                    throw new JsonException("Cannot populate list ${name} in ${obj}.");
                oldValue.Clear();
                if (newValue != null)
                    oldValue.AddRange(newValue);
            };
        }
    
        static MemberInfo? GetMemberInfo(this JsonPropertyInfo property) => (property.AttributeProvider as MemberInfo);
        
        static Type? GetListItemType(this Type type) =>
            type.IsGenericType && type.GetGenericTypeDefinition() == typeof(List<>) ? type.GetGenericArguments()[0] : null;
    }
    

    And then deserialize e.g. as follows:

    var options = new JsonSerializerOptions
    {
        TypeInfoResolver = new DefaultJsonTypeInfoResolver
        {
            Modifiers = { 
                JsonExtensions.InitializeProtoMemberNames(typeof(Person)), 
                JsonExtensions.InitializeGetOnlyListSetters(typeof(Person)) 
            },
        },
    };
    
    await using (FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true))
    {
        IAsyncEnumerable<Person?> people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream, options);
        await foreach (Person? person in people)
        {
            Console.WriteLine($"Hello, my name is \"{person?.Name}\", my tracking data is {JsonSerializer.Serialize(person?.TrackingDatas.Select(t => t.Id))}!");
        }           
    }
    

    Notes:

    • As explained in Asynchronous streams and disposables, the await using syntax should be used to dispose of file streams when writing async code.

    • In order to actually enable asynchronous FileStream access, pass useAsync : true to the FileStream constructor. See the docs for a discussion of the possible performance implications.

    • CustomNamingPolicy is no longer needed with this approach.

    Demo fiddle #2 here.

    Finally, in .NET 8 and later, populating of preallocated read-only collection properties is supported out of the box by setting

    JsonSerializerOptions.PreferredObjectCreationHandling = 
        JsonObjectCreationHandling.Populate;
    

    Thus InitializeGetOnlyListSetters() is no longer needed and your code can be simplified e.g. as follows:

    var options = new JsonSerializerOptions
    {
        PreferredObjectCreationHandling = JsonObjectCreationHandling.Populate,           
        TypeInfoResolver = new DefaultJsonTypeInfoResolver()
            .WithAddedModifier(JsonExtensions.InitializeProtoMemberNames(typeof(Person))),
    };
    
    await using var fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
    var people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream, options);
    await foreach (var person in people)
    {
        // Code to process each person
    }   
    

    For details, see Populate initialized properties.

    Demo fiddle #3 here.