Do cereal and Boost Serialization use zero-copy?

I have done some performance comparison between several serialization protocols, including FlatBuffers, Cap'n Proto, Boost serialization and cereal. All the tests are written in C++.

I know that FlatBuffers and Cap'n Proto use zero-copy. With zero-copy, serialization time is null but size of serialized objects is bigger.

I thought that cereal and Boost serialization didn't use zero-copy. However, serialization time (for int and double) is nearly null, and size of serialized objects is nearly the same as Cap'n Proto or Flatbuffers ones. I didn't find any information about zero-copy in their documentations.

Do cereal and Boost serialization use zero-copy too ?

Solution

Boost and Cereal do not implement zero-copy in the sense of Cap'n Proto or Flatbuffers.

With true zero-copy serialization, the backing store for your live in-memory objects is in fact exactly the same memory segment that is passed to the read() or write() system calls. There is no packing/unpacking step at all.

Generally, this has a number of implications:

Objects are not allocated using new/delete. When constructing a message, you allocate the message first, which allocates a long contiguous memory space for the message contents. You then allocate the message structure directly inside the message, receiving pointers that in fact point into the message's memory. When the message is later written, a single write() call shoves this whole memory space out to the wire.
Similarly, when you read in a message, a single read() call (or maybe 2-3) reads in the entire message into one block of memory. You then get a pointer (or, a pointer-like object) to the "root" of the message, which you can use to traverse it. Note that no part of the message is actually inspected until your application traverses it.
With normal sockets, the only copies of your data happen in kernel space. With RDMA networking, you may even be able to avoid kernel-space copies: the data comes off the wire directly into its final memory location.
When working with files (rather than networks) it's possible to mmap() a very large message directly from disk and use the mapped memory region directly. Doing so is O(1) -- it doesn't matter how big the file is. Your operating system will automatically page in the necessary parts of the file when you actually access them.
Two processes on the same machine can communicate through shared memory segments with no copies. Note that, generally, regular old C++ objects do not work well in shared memory, because the memory segments usually don't have the same address in both memory spaces, thus all the pointers are wrong. With a zero-copy serialization framework, the pointers are usually expressed as offsets rather than absolute addresses, so that they are position-independent.

Boost and Cereal are different: When you receive a message in these systems, first a pass is performed over the entire message to "unpack" the contents. The final resting place of the data is in objects allocated in the traditional way using new/delete. Similarly, when sending a message, the data has to be collected from this tree of objects and packed together into one buffer in order to be written out. Even though Boost and Cereal are "extensible", being truly zero-copy requires a very different underlying design; it cannot be bolted-in as an extension.

That said, don't assume zero-copy will always be faster. memcpy() can be pretty fast, and the rest of your program may dwarf the cost. Meanwhile, zero-copy systems tend to have inconvenient APIs, particularly because of the restrictions on memory allocation. It may be overall a better use of your time to use a traditional serialization system.

The place where zero-copy is most obviously advantageous is when manipulating files, since as I mentioned you can easily mmap() a huge file and only read part of it. Non-zero-copy formats simply can't do that. When it comes to networking, though, the advantages are less clear, since the network communication itself is necessarily O(n).

At the end of the day, if you really want to know which serialization system is fastest for your use case, you will probably need to try them all and measure them. Note that toy benchmarks are usually misleading; you need to test your actual use case (or something very similar) to get useful information.

Disclosure: I am the author of Cap'n Proto (a zero-copy serializer) and Protocol Buffers v2 (a popular non-zero-copy serializer).