Search code examples
c#serializationjson.netbson

Using streams for creating BSON byte array via Json.NET (for file format)


We need the BSON equivalent to

{
    "Header": {
        "SubHeader1": {
            "Name": "Bond",
            "License": 7
        },
        "SubHeader2": {
            "IsActive": true
        }
    },
    "Payload": /* This will be a 40GB byte stream! */
}

But what we get is:

enter image description here

As you can see, the payload comes FIRST, and then the rest of the header!

We're using Json.NET's BSON writer (Bson.BsonWriter.WriteValue(byte[] value)), but it only accepts an actual byte[], not Stream. Since our payloads will be 10s of GB, we must use streams, so we've tried to work around (code below) but that gives us the incorrect result shown above

public void Expt()
{
    // Just some structure classes, defined below
    var fileStruct = new FileStructure();

    using (Stream outputSt = new FileStream("TestBinary.bson", FileMode.Create))
    {
        var serializer = new JsonSerializer();
        var bw = new BsonWriter(outputSt);

        // Start
        bw.WriteStartObject();

        // Write header            
        bw.WritePropertyName("Header");
        serializer.Serialize(bw, fileStruct.Header);

        // Write payload
        bw.WritePropertyName("Payload");
        bw.Flush(); // <== flush !                
        // In reality we 40GB into the stream, dummy example for now
        byte[] dummyPayload = Encoding.UTF8.GetBytes("This will be a 40GB byte stream!");
        outputSt.Write(dummyPayload, 0, dummyPayload.Length);

        // End
        bw.WriteEndObject();
    }    
}

This looks like the classic case of no synchronization / not flushing buffers despite us actually issuing a Flush to Json.NET before writing the payload to the underlying stream.

Question: Is there another way to do this? We'd rather not fork off Json.NET's source (and exploring it's internal piping) or re-invent the wheel somehow ...


Details: The supporting structure classes are (if you want to repro this)

public class FileStructure
{
    public TopHeader Header { get; set; }
    public byte[] Payload { get; set; }

    public FileStructure()
    {
        Header = new TopHeader
            {
                SubHeader1 = new SubHeader1 {Name = "Bond", License = 007},
                SubHeader2 = new SubHeader2 {IsActive = true}
            };
    }
}

public class TopHeader
{
    public SubHeader1 SubHeader1 { get; set; }
    public SubHeader2 SubHeader2 { get; set; }
}

public class SubHeader1
{
    public string Name { get; set; }
    public int License { get; set; }
}

public class SubHeader2
{
    public bool IsActive { get; set; }
}

Solution

  • Ok, so we reached some middle ground here because we don't have the time (at the moment) to fix otherwise great Json.NET library. Since we're lucky to have the Stream only at the end, we're now using BSON for the header (small enough for a byte[]) and then passing it onto a standard stream writer i.e. the representation is:

    {
        "SubHeader1": {
            "Name": "Bond",
            "License": 7
        },
        "SubHeader2": {
            "IsActive": true
        }
    } /* End of valid BSON */
    // <= Our Stream is written here, raw byte stream, no BSON
    

    It would have been more aesthetic to have a uniform BSON layout but in the absence of it, this works great too. Probably a bit faster too! If someone still finds a better answer in the future, we're listening.