Search code examples
c#jsondeserialization.net-6.0system.text.json

System.Text.Json UTF-8 Deserialization confusion


I have a generic JSON file caching service that will take a .NET object and write it as default/typical minified JSON to the disk. While this works nicely, I wanted to try improving this a bit, by storing and working with UTF-8 byte arrays.

// getFunc() is a Func<T> that just defines what the shape of the data is
byte[] bytes = JsonSerializer.SerializeToUtf8Bytes(getFunc());
using FileStream stream = File.Create(filePath);
await JsonSerializer.SerializeAsync(stream, bytes);
await stream.DisposeAsync();

In trying to get this to work, I'm working with a sample/dummy JSON object that I grabbed online and generated the corresponding .NET POCOs. When this code runs, the serialization works and my JSON file has this contents:

"eyJkYXRhIjpbeyJ0eXBlIjoiYXJ0aWNsZXMiLCJpZCI6IjEiLCJhdHRyaWJ1dGVzIjp7InRpdGxlIjoiSlNPTjpBUEkgcGFpbnRzIG15IGJpa2UiLCJib2R5IjoiVGhlIHNob3J0ZXN0IGFydGljbGUgZXZlciIsImNyZWF0ZWQiOiIyMDIzLTA4LTA5VDA3OjI1OjQwLjI5MTQ4MDQtMDQ6MDAiLCJ1cGRhdGVkIjoiMjAyMy0wOC0wOVQwNzoyNTo0MC4yOTE1MTY5LTA0OjAwIn0sInJlbGF0aW9uc2hpcHMiOnsiYXV0aG9yIjp7ImRhdGEiOnsiaWQiOiI0MiIsInR5cGUiOiJwZW9wbGUifX19fV0sImluY2x1ZGVkIjpbeyJ0eXBlIjoicGVvcGxlIiwiaWQiOiI0MiIsImF0dHJpYnV0ZXMiOnsibmFtZSI6Ik1hdHQiLCJhZ2UiOjMyLCJnZW5kZXIiOiJtYWxlIn19XX0="

When decoded through an online Base64 decoder, provides the following information:

{"data":[{"type":"articles","id":"1","attributes":{"title":"JSON:API paints my bike","body":"The shortest article ever","created":"2023-08-09T07:25:40.2914804-04:00","updated":"2023-08-09T07:25:40.2915169-04:00"},"relationships":{"author":{"data":{"id":"42","type":"people"}}}}],"included":[{"type":"people","id":"42","attributes":{"name":"Matt","age":32,"gender":"male"}}]}

So here's where the confusion is: in trying to deserialize this byte[] from UTF-8 (following this documentation section - https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/how-to?pivots=dotnet-6-0#deserialize-from-utf-8) it says "this example assumes the JSON is in a byte array named jsonUtf8Bytes". This makes it seem like all that should be needed here would be these 2 lines, however these throw a pretty bland JSON exception that is causing me some grief

ReadOnlySpan<byte> jsonSpan = File.ReadAllBytes(filePath);
var content = JsonSerializer.Deserialize<T>(jsonSpan);
System.Text.Json.JsonException: The JSON value could not be converted to MyProject.Rootobject. Path: $ | LineNumber: 0 | BytePositionInLine: 502.
   at System.Text.Json.ThrowHelper.ThrowJsonException_DeserializeUnableToConvertValue(Type propertyType)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](ReadOnlySpan`1 utf8Json, JsonSerializerOptions options)

After hammering away trying to make this work, I was finally able to get the byte[] file to deserialize back to the original .NET object, but only after deserializing the file as a string and converting from base64 to byte[] was I then able to deserialize back to the original object:

var fileBytes = File.ReadAllBytes(filePath);
var value = JsonSerializer.Deserialize<string>(fileBytes);
byte[] bytes = Convert.FromBase64String(value);
var content = JsonSerializer.Deserialize<T>(bytes);

This feels a bit redundant to deserialize back to the original object. I feel as though there is something I'm missing but I don't know what I don't know. When the Deserialize call throws an exception, that byte array has a length of 502 but is only ~374 after adding the additional code to convert from base64 to receive the byte[] that can be deserialized. Am I missing something or is this the expected way to go about working with objects stored as byte[]?


Solution

  • The bug is in the code - it's serializing the data twice and using at least twice as much RAM

    1. The data is serialized to JSON, as UTF8 bytes and then
    2. The bytes are again serialized to JSON. BASE64 is the representations of bytes in JSON

    There's no reason to try and use SerializeToUtf8Bytes. The JsonSerializer class already uses Utf8JsonWriter, along with buffers coming from a buffer pool. It's already reusing buffers where needed.

    The source code shows that SerializeAsync passes the call to JsonTypeInfo.SerializeAsync.

    JsonTypeInfo<TValue> jsonTypeInfo = GetTypeInfo<TValue>(options);
    jsonTypeInfo.Serialize(utf8Json, value);
    

    That method in turn uses cached writers with buffers from a reusable pool:

    using var bufferWriter = new PooledByteBufferWriter(Options.DefaultBufferSize);
    Utf8JsonWriter writer = Utf8JsonWriterCache.RentWriter(Options, bufferWriter);
    
    try
    {
        SerializeHandler(writer, rootValue!);
        writer.Flush();
    }
    finally
    {
        // Record the serialization size in both successful and failed operations,
        // since we want to immediately opt out of the fast path if it exceeds the threshold.
                        
        OnRootLevelAsyncSerializationCompleted(writer.BytesCommitted + writer.BytesPending);
    
        Utf8JsonWriterCache.ReturnWriter(writer);
    }