Search code examples
c#json.netdeserializationbenchmarkdotnet

Benchmarking Newtonsoft.Json deserialization: from stream and from string


I'm interested in performance (speed, memory usage) comparison of two approaches how to deserialize HTTP response JSON payload using Newtonsoft.Json.

I'm aware of Newtonsoft.Json's Performance Tips to use streams, but I wanted to know more and have hard numbers. I've written simple benchmark using BenchmarkDotNet, but I'm bit puzzled by results (see numbers below).

What I got:

  • parsing from stream is always faster, but not really much
  • parsing small and "medium" JSON has better or equal memory usage when using string as input
  • significant difference in memory usage starts to be seen with large JSON (where string itself ends up in LOH)

I didn't have time to do proper profiling (yet), I'm bit surprised by memory overhead with stream approach (if there's no error). Whole code is here.

?

  • Is my approach correct? (usage of MemoryStream; simulating HttpResponseMessage and its content; ...)
  • Is there any issue with benchmarking code?
  • Why do I see such results?

Benchmark setup

I'm preparing MemoryStream to be used over and over within benchmark run:

[GlobalSetup]
public void GlobalSetup()
{
    var resourceName = _resourceMapping[typeof(T)];
    using (var resourceStream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
    {
        _memory = new MemoryStream();
        resourceStream.CopyTo(_memory);
    }

    _iterationRepeats = _repeatMapping[typeof(T)];
}

Stream deserialization

[Benchmark(Description = "Stream d13n")]
public async Task DeserializeStream()
{
    for (var i = 0; i < _iterationRepeats; i++)
    {
        var response = BuildResponse(_memory);

        using (var streamReader = BuildNonClosingStreamReader(await response.Content.ReadAsStreamAsync()))
        using (var jsonReader = new JsonTextReader(streamReader))
        {
            _serializer.Deserialize<T>(jsonReader);
        }
    }
}

String deserialization

We first read JSON from stream to string, and then run deserialization - another string is being allocated, and after that used for deserialization.

[Benchmark(Description = "String d13n")]
public async Task DeserializeString()
{
    for (var i = 0; i < _iterationRepeats; i++)
    {
        var response = BuildResponse(_memory);

        var content = await response.Content.ReadAsStringAsync();
        JsonConvert.DeserializeObject<T>(content);
    }
}

Common methods

private static HttpResponseMessage BuildResponse(Stream stream)
{
    stream.Seek(0, SeekOrigin.Begin);

    var content = new StreamContent(stream);
    content.Headers.ContentType = new MediaTypeHeaderValue("application/json");

    return new HttpResponseMessage(HttpStatusCode.OK)
    {
        Content = content
    };
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static StreamReader BuildNonClosingStreamReader(Stream inputStream) =>
    new StreamReader(
        stream: inputStream,
        encoding: Encoding.UTF8,
        detectEncodingFromByteOrderMarks: true,
        bufferSize: 1024,
        leaveOpen: true);

Results

Small JSON

Repeated 10000 times

  • Stream: mean 25.69 ms, 61.34 MB allocated
  • String: mean 31.22 ms, 36.01 MB allocated

Medium JSON

Repeated 1000 times

  • Stream: mean 24.07 ms, 12 MB allocated
  • String: mean 25.09 ms, 12.85 MB allocated

Large JSON

Repeated 100 times

  • Stream: mean 229.6 ms, 47.54 MB allocated, objects got to Gen 1
  • String: mean 240.8 ms, 92.42 MB allocated, objects got to Gen 2!

Update

I went trough source of JsonConvert and found out that it internally uses JsonTextReader with StringReader when deserializing from string: JsonConvert:816. Stream is involved there as well (of course!).

Then I decided to dig more into StreamReader itself and I was stunned at first sight - it is always allocating array buffer (byte[]): StreamReader:244, which explains its memory use.

This gives me answer to "why". Solution is simple - use smaller buffer size when instantiating StreamReader - minimum buffer size defaults to 128 (see StreamReader.MinBufferSize), but you can supply any value > 0 (check one of ctor overload).

Of course buffer size has effect on processing data. Answering what buffer size I should then use: it depends. When expecting smaller JSON responses, I think it is safe to stick with small buffer.


Solution

  • After some fiddling I found reason behind memory allocation when using StreamReader. Original post is updated, but recap here:

    StreamReader uses default bufferSize set to 1024. Every instantiation of StreamReader then allocates byte array of that size. That's the reason why I saw such numbers in my benchmark.

    When I set bufferSize to its lowest possible value 128, results seem to be much better.