Search code examples
.netjson.netsystem.text.jsonjsonlinesndjson

parsing text that is a sequence of JSON objects without enclosing brackets (there is no root object) in .NET?


Suppose I have an input stream coming in that is a sequence of objects, like so:

{ "value" : 1}
{ "value" : 2}
{ "value" : 3}
{ "value" : 4}

How can I deserialize these objects in C#? Using System.Text.Json is preferred, Newtonsoft is fine. It would be straightforward if the data was wrapped in brackets, I could just deserialize into an array. But I don't have control on the incoming data stream.

Using System.Text.Json I could use JsonSerializer.DeserializeAsync, but only after I create my only stream class to wrap the existing data stream with brackets as it comes in. It'd still be a bit undesirable because I have to wait for the entire (synthetic) array to be read before I can look at the first element.


Solution

  • The file format shown in your question is sometimes called NDJSON - Newline delimited JSON or JSON Lines. As of .NET 9, streaming deserialization of this format is officially supported by System.Text.Json.

    JsonSerializer.DeserializeAsyncEnumerable() now has additional overloads with a bool topLevelValues argument which enables support for multiple concatenated top-level JSON documents in a single stream.

    Thus, for the JSON shown in your question, if your model looks like:

    public record MyRecord(int value);
    

    you can now stream through the records in your NDJSON file as follows, using await foreach:

    using var stream =  new FileStream(filename, 
                                       FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1024, 
                                       useAsync: true);
    var options = new JsonSerializerOptions
    {
        // Add any required options here
    };
    var items = JsonSerializer.DeserializeAsyncEnumerable<MyRecord>(stream, 
                                                                    topLevelValues: true, 
                                                                    options : options);
    await foreach (var item in items)
    {
        Console.WriteLine("   Item: {0}, Stream position: {1}", item, stream.Position);
    }
    

    Note however that System.Text.Json will not yield each record as it is deserialized. Instead it will try to read a chunk of bytes from the stream equal in size to JsonSerializerOptions.DefaultBufferSize which has a default value of 16,384 bytes, deserialize all the top-level values in the chunk, then yield them all at once. Normally this won't matter because peak memory use will still be bounded by the chunk size, but it may cause delays when reading from a NetworkStream (I have not checked this to make sure). You can force System.Text.Json to return more eagerly by decreasing the buffer size, e.g.:

    var options = new JsonSerializerOptions
    {
        // Add any required options here
        DefaultBufferSize = 1024,
    };
    

    Note also that System.Text.Json doesn't actually require a newline or any other whitespace between records, they can simply be concatenated:

    {"value":1}{"value":2}{"value":3}{"value":4}{"value":5}{"value":6}{"value":7}
    

    For more, see Read multiple JSON documents.

    Demo .NET 9 fiddle here: https://dotnetfiddle.net/GCIEN8.

    As for Json.NET, streaming deserialization of this format has been supported for years by setting JsonReader.SupportMultipleContent = true. See Line delimited json serializing and de-serializing.