Search code examples
c#jsonbigintegersystem.text.jsonjsonconverter

Enforce the use of Utf8JsonReader.ValueSequence for test purposes


I wrote my own BigIntegerConverter for JSON serialization/deserialization (.Net System.Text.Json) In the Read method I checked if ValueSequence is used

...
string stringValue;
if (reader.HasValueSequence)
{
    stringValue = Encoding.UTF8.GetString(reader.ValueSequence);
}
else
{
    stringValue = Encoding.UTF8.GetString(reader.ValueSpan);
}

if (BigInteger.TryParse(stringValue, CultureInfo.InvariantCulture, out var result))
{
    return result;
}
...

Now I want to test that code, but I am only able to get to the else tree so far. Based on the documentation I assumed that ValueSequence will be used if the data got big enough. However, I already testing with BigIntegers as big as BigInteger.Pow(new BigInteger(long.MaxValue), 1234); and still cannot get the ValueSequence to be used.

Did I missunderstood something? Is there a way to enforce the use of ValueSqeuence for test purposes?

My testcase looks like this

[Theory]
[MemberData(nameof(GetNotNullTestData))]
public void Read_EntityWithNotNullableBigInteger(string name, BigInteger expected, string value)
{
    // Arrange
    var json = $$"""{"Name":"{{name}}","NotNullableValue":{{value}}}""";
    // Act
    var result = JsonSerializer.Deserialize<NotNullableBigIntegerEntity>(json, _options);
    // Assert
    Assert.NotNull(result);
    Assert.Equal(name, result.Name);
    Assert.Equal(expected, result.NotNullableValue);
}

Regards Michael


Solution

  • Utf8JsonReader has a constructor that takes a ReadOnlySequence<byte>, so you could take your JSON string, encode it to a UTF8 byte array, break that into small chunks, then convert that sequence of chunks into a ReadOnlySequence<byte> using ReadOnlySequenceFactory from this answer to Deserialize very large json from a chunked array of strings using system.text.json.

    First, introduce the following factory class:

    // From this answer https://stackoverflow.com/a/61087772 to https://stackoverflow.com/questions/61079767/deserialize-very-large-json-from-a-chunked-array-of-strings-using-system-text-js
    public static class ReadOnlySequenceFactory
    {
        public static ReadOnlySequence<T> AsSequence<T>(this IEnumerable<T []> buffers) => ReadOnlyMemorySegment<T>.Create(buffers.Select(a => new ReadOnlyMemory<T>(a)));
        public static ReadOnlySequence<T> AsSequence<T>(this IEnumerable<ReadOnlyMemory<T>> buffers) => ReadOnlyMemorySegment<T>.Create(buffers);
    
        // There is no public concrete implementation of ReadOnlySequenceSegment<T> so we must create one ourselves.
        // This is modeled on https://github.com/dotnet/runtime/blob/v5.0.18/src/libraries/System.Text.Json/tests/BufferFactory.cs
        // by https://github.com/ahsonkhan
        class ReadOnlyMemorySegment<T> : ReadOnlySequenceSegment<T>
        {
            public static ReadOnlySequence<T> Create(IEnumerable<ReadOnlyMemory<T>> buffers)
            {
                ReadOnlyMemorySegment<T>? first = null;
                ReadOnlyMemorySegment<T>? current = null;
                foreach (var buffer in buffers)
                {
                    var next = new ReadOnlyMemorySegment<T> { Memory = buffer };
                    if (first == null)
                        first = next;
                    else
                    {
                        current!.Next = next;
                        next.RunningIndex = current.RunningIndex + current.Memory.Length;
                    }
                    current = next;
                }
                if (first == null)
                    first = current = new ();
    
                return new ReadOnlySequence<T>(first, 0, current!, current!.Memory.Length);
            }
        }
    }
    

    Then write your converter e.g. as follows:

    public class BigIntegerConverter : JsonConverter<BigInteger>
    {
        // The actual implementation seems to be in INumberBase<TSelf>.TryParse() so I had to do this to call the method:
        static bool TryParse<TSelf>(ReadOnlySpan<byte> utf8Text, out TSelf? value) where TSelf : IUtf8SpanParsable<TSelf>, new() => 
            TSelf.TryParse(utf8Text, NumberFormatInfo.InvariantInfo, out value);
    
        public override BigInteger Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        {
            if (reader.TokenType != JsonTokenType.Number)
                throw new JsonException(string.Format("Found token {0} but expected token {1}", reader.TokenType, JsonTokenType.Number ));
            var utf8Text = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
            if (TryParse<BigInteger>(utf8Text, out var value))
                return value;
            throw new JsonException();
        }
    
        public override void Write(Utf8JsonWriter writer, BigInteger value, JsonSerializerOptions options) =>
            writer.WriteRawValue(value.ToString(NumberFormatInfo.InvariantInfo), false);
    }
    

    Now you will be able to write your test method as follows, ensuring your input JSON is broken into small chunks:

    public record NotNullableBigIntegerEntity(string Name, BigInteger NotNullableValue);
    
    JsonSerializerOptions _options = new()
    {
        Converters = { new BigIntegerConverter() },
    };
    
    public void Read_EntityWithNotNullableBigInteger(string name, BigInteger expected, string value)
    {
        int byteChunkSize = 3;
        
        // Arrange
        var json = $$"""{"Name":"{{name}}","NotNullableValue":{{value}}}""";
        // Break into chunks
        var utf8json = Encoding.UTF8.GetBytes(json);
        var sequence = utf8json.Chunk(byteChunkSize).AsSequence();
        // Act
        var reader = new Utf8JsonReader(sequence);
        var result = JsonSerializer.Deserialize<NotNullableBigIntegerEntity>(ref reader, _options);
        // Assert
        Assert.NotNull(result);
        Assert.Equal(name, result?.Name);
        Assert.Equal(expected, result?.NotNullableValue);
    }
    

    Notes:

    • BigInteger implements IUtf8SpanParsable<BigInteger> which allows direct parsing from UTF8 encoded byte spans. As such there's no need to call Encoding.UTF8.GetString() to construct a UTF16 string corresponding to the current value.

    • Based on the documentation I assumed that ValueSequence will be used if the data got big enough. -- maybe, maybe not. MSFT seems to have intended ReadOnlySequence<byte> to be used when deserializing asynchronously from some request or response stream, but you are deserializing synchronously from an in-memory string. Whether MSFT chooses to break that string down into a single UTF8 byte span or multiple UTF8 byte sequences is an implementation detail which they do not make public.

      (When I check the reference source, the current code seems to convert the incoming string to a single byte span.)

    Demo fiddle here.