I have a generic JSON file caching service that will take a .NET object and write it as default/typical minified JSON to the disk. While this works nicely, I wanted to try improving this a bit, by storing and working with UTF-8 byte arrays.
// getFunc() is a Func<T> that just defines what the shape of the data is
byte[] bytes = JsonSerializer.SerializeToUtf8Bytes(getFunc());
using FileStream stream = File.Create(filePath);
await JsonSerializer.SerializeAsync(stream, bytes);
await stream.DisposeAsync();
In trying to get this to work, I'm working with a sample/dummy JSON object that I grabbed online and generated the corresponding .NET POCOs. When this code runs, the serialization works and my JSON file has this contents:
"eyJkYXRhIjpbeyJ0eXBlIjoiYXJ0aWNsZXMiLCJpZCI6IjEiLCJhdHRyaWJ1dGVzIjp7InRpdGxlIjoiSlNPTjpBUEkgcGFpbnRzIG15IGJpa2UiLCJib2R5IjoiVGhlIHNob3J0ZXN0IGFydGljbGUgZXZlciIsImNyZWF0ZWQiOiIyMDIzLTA4LTA5VDA3OjI1OjQwLjI5MTQ4MDQtMDQ6MDAiLCJ1cGRhdGVkIjoiMjAyMy0wOC0wOVQwNzoyNTo0MC4yOTE1MTY5LTA0OjAwIn0sInJlbGF0aW9uc2hpcHMiOnsiYXV0aG9yIjp7ImRhdGEiOnsiaWQiOiI0MiIsInR5cGUiOiJwZW9wbGUifX19fV0sImluY2x1ZGVkIjpbeyJ0eXBlIjoicGVvcGxlIiwiaWQiOiI0MiIsImF0dHJpYnV0ZXMiOnsibmFtZSI6Ik1hdHQiLCJhZ2UiOjMyLCJnZW5kZXIiOiJtYWxlIn19XX0="
When decoded through an online Base64 decoder, provides the following information:
{"data":[{"type":"articles","id":"1","attributes":{"title":"JSON:API paints my bike","body":"The shortest article ever","created":"2023-08-09T07:25:40.2914804-04:00","updated":"2023-08-09T07:25:40.2915169-04:00"},"relationships":{"author":{"data":{"id":"42","type":"people"}}}}],"included":[{"type":"people","id":"42","attributes":{"name":"Matt","age":32,"gender":"male"}}]}
So here's where the confusion is: in trying to deserialize this byte[] from UTF-8 (following this documentation section - https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/how-to?pivots=dotnet-6-0#deserialize-from-utf-8) it says "this example assumes the JSON is in a byte array named jsonUtf8Bytes". This makes it seem like all that should be needed here would be these 2 lines, however these throw a pretty bland JSON exception that is causing me some grief
ReadOnlySpan<byte> jsonSpan = File.ReadAllBytes(filePath);
var content = JsonSerializer.Deserialize<T>(jsonSpan);
System.Text.Json.JsonException: The JSON value could not be converted to MyProject.Rootobject. Path: $ | LineNumber: 0 | BytePositionInLine: 502.
at System.Text.Json.ThrowHelper.ThrowJsonException_DeserializeUnableToConvertValue(Type propertyType)
at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
at System.Text.Json.JsonSerializer.Deserialize[TValue](ReadOnlySpan`1 utf8Json, JsonSerializerOptions options)
After hammering away trying to make this work, I was finally able to get the byte[] file to deserialize back to the original .NET object, but only after deserializing the file as a string and converting from base64 to byte[] was I then able to deserialize back to the original object:
var fileBytes = File.ReadAllBytes(filePath);
var value = JsonSerializer.Deserialize<string>(fileBytes);
byte[] bytes = Convert.FromBase64String(value);
var content = JsonSerializer.Deserialize<T>(bytes);
This feels a bit redundant to deserialize back to the original object. I feel as though there is something I'm missing but I don't know what I don't know. When the Deserialize call throws an exception, that byte array has a length of 502
but is only ~374
after adding the additional code to convert from base64 to receive the byte[] that can be deserialized. Am I missing something or is this the expected way to go about working with objects stored as byte[]?
The bug is in the code - it's serializing the data twice and using at least twice as much RAM
There's no reason to try and use SerializeToUtf8Bytes
. The JsonSerializer class already uses Utf8JsonWriter
, along with buffers coming from a buffer pool. It's already reusing buffers where needed.
The source code shows that SerializeAsync passes the call to JsonTypeInfo.SerializeAsync.
JsonTypeInfo<TValue> jsonTypeInfo = GetTypeInfo<TValue>(options);
jsonTypeInfo.Serialize(utf8Json, value);
That method in turn uses cached writers with buffers from a reusable pool:
using var bufferWriter = new PooledByteBufferWriter(Options.DefaultBufferSize);
Utf8JsonWriter writer = Utf8JsonWriterCache.RentWriter(Options, bufferWriter);
try
{
SerializeHandler(writer, rootValue!);
writer.Flush();
}
finally
{
// Record the serialization size in both successful and failed operations,
// since we want to immediately opt out of the fast path if it exceeds the threshold.
OnRootLevelAsyncSerializationCompleted(writer.BytesCommitted + writer.BytesPending);
Utf8JsonWriterCache.ReturnWriter(writer);
}