Search code examples
c#jsonjson.net

Parse JSON with duplicate keys using custom Newtonsoft JSON converter


I have an invalid JSON, that I need to parse using Newtonsoft. The problem is that instead of using a proper array, the JSON contains duplicate properties for each entry in the array.

I've got some working code, but really not sure if this is the way to go or if there's an easier way?

The invalid JSON:

{
    "Quotes": {
        "Quote": {
            "Text": "Hi"
        },
        "Quote": {
            "Text": "Hello"
        }
    }
}

The object I'm trying to serialize into:

class MyTestObject
{
    [JsonConverter(typeof(NewtonsoftQuoteListConverter))]
    public IEnumerable<Quote> Quotes { get; set; }
}

class Quote
{
    public string Text { get; set; }
}

The read method of the JsonConverter

public override IEnumerable<Quote> ReadJson(JsonReader reader, Type objectType, IEnumerable<Quote> existingValue, bool hasExistingValue, JsonSerializer serializer)
{
    if (reader.TokenType == JsonToken.Null)
    {
        return null;
    }

    var quotes = new List<Quote>();
    while (reader.Read())
    {
        if (reader.Path.Equals("quotes", StringComparison.OrdinalIgnoreCase) && reader.TokenType == JsonToken.EndObject)
        {
            // This is the end of the Quotes block. We've parsed the entire object. Stop reading.
            break;
        }
        
        if (reader.Path.Equals("quotes.quote", StringComparison.OrdinalIgnoreCase) && reader.TokenType == JsonToken.StartObject)
        {
            // This is the start of a new Quote object. Parse it.
            quotes.Add(serializer.Deserialize<Quote>(reader));
        }
    }
    
    return quotes;
}

I only need reading of JSON with duplicate keys, not writing.


Solution

  • I can see a few problems with your converter:

    1. Because you hardcode the path, your converter won't work when MyTestObject is embedded in some higher-level container. In fact it will likely leave the reader positioned incorrectly.

    2. Your converter doesn't correctly skip past comments.

    3. Your converter doesn't populate the incoming existingValue when present, which is necessary when deserializing get-only collection properties.

    4. You don't take the current naming strategy into account.

    5. Your converter will not throw an exception or otherwise indicate an error when a truncated file is encountered.

    As an alternative, you might take advantage of the fact that Json.NET will call the setter for a property multiple times when that property is encountered multiple times in the JSON, to accumulate the "Quote" property values with a set-only surrogate property in a DTO like so:

    class NewtonsoftQuoteListConverter : JsonConverter<IEnumerable<Quote>>
    {
        class DTO
        {
            public ICollection<Quote> Quotes { get; init; }
            public Quote Quote { set => Quotes.Add(value); }
        }
    
        public override IEnumerable<Quote> ReadJson(JsonReader reader, Type objectType, IEnumerable<Quote> existingValue, bool hasExistingValue, JsonSerializer serializer)
        {
            if (reader.MoveToContentAndAssert().TokenType == JsonToken.Null)
                return null;
            var dto = new DTO { Quotes = existingValue is ICollection<Quote> l && !l.IsReadOnly ? l : new List<Quote>() }; // Reuse existing value if possible
            serializer.Populate(reader, dto); 
            return dto.Quotes;
        }
        
        public override bool CanWrite => true; // Replace with false if you don't need custom serialization.
        
        public override void WriteJson(JsonWriter writer,  IEnumerable<Quote> value, JsonSerializer serializer)
        {
            // Handle naming strategies.
            var name = ((JsonObjectContract)serializer.ContractResolver.ResolveContract(typeof(DTO))).Properties.Where(p => p.UnderlyingName == nameof(DTO.Quote)).First().PropertyName;
        
            writer.WriteStartObject();
            foreach (var item in value)
            {
                writer.WritePropertyName(name);
                serializer.Serialize(writer, item);
            }
            writer.WriteEndObject();
        }
    }
    
    public static partial class JsonExtensions
    {
        public static JsonReader MoveToContentAndAssert(this JsonReader reader)
        {
            if (reader == null)
                throw new ArgumentNullException();
            if (reader.TokenType == JsonToken.None)       // Skip past beginning of stream.
                reader.ReadAndAssert();
            while (reader.TokenType == JsonToken.Comment) // Skip past comments.
                reader.ReadAndAssert();
            return reader;
        }
    
        public static JsonReader ReadAndAssert(this JsonReader reader)
        {
            if (reader == null)
                throw new ArgumentNullException();
            if (!reader.Read())
                throw new JsonReaderException("Unexpected end of JSON stream.");
            return reader;
        }
    }
    

    By using a DTO, the current naming convention is taken into account.

    If you don't need custom serialization, override CanWrite and return false.

    Demo fiddle here.