Search code examples
serializationprotocol-buffers

Can I serialize a protobuf object in multiple chunks?


I want to send my serialized protobuf object in chunks no larger than MAX_SIZE bytes.

If I call SerializeToArray(buff, MAX_SIZE) multiple times, will it keep serializing the first MAX_SIZE bytes each time, or will it actually do what I expect (serializing the entire object in consecutive chunks of MAX_SIZE bytes at most)?

From the few experiments I made, it looks like it does the former. If that's indeed the case, is there a way to achieve what I want without having to first serialize the entire object into a temporary array, and then retrieve data from that array in chunks of MAX_SIZE bytes at most?

Note: I work in C/C++ though I believe this should not impact the answer.


Solution

  • To do this, you need to do the chunking oneself.

    There are two approaches:

    1. Decompose the structure of the original message into a collection of smallers messages, such that when each is serialised the wire data is small enough
    2. Serialize the original message to an array big enough to contain the whole thing, and then break that array up into separte chunks, perhaps sending each chunk within a simple GPB message that contains 1 chunk and a description of whereabouts in a sequence of chunks it fits.

    Option 1 gets fiddly.

    Option 2 is like introducing a lower layer in the protocol between your sender and receiver. The simple message could be something like:

    message Chunk
    {
        int32 messageNum = 1;
        int32 chunkNum = 2;
        int32 totalChunks = 3;
        bytes chunkData = 4;
    }
    

    where you restrict the size of chunkData so that, when you serialize Chunk you get less than MAX_SIZE bytes. You'd serialize your original message to an array, and then iterate through that generating Chunks and sending them. The receiver would reconstitute the array from the chunkData, and when it's complete would call ParseFromArray() to derialise the original message

    The messageNum may be necesssary for you to identify which one of the original messages a chunk was for. For example, if the channel between sender and receiver is not "perfect" and the receiver may start receiving Chunks part way through; the messageNum allows the receiver to spot what's going on.