Search code examples
protocol-bufferscapnproto

CapnProto maximum filesize


At the moment we are using ProtocolBuffers to exchange data between python and C++. However, we are running into the maximum filesize limitation of protocol buffers and are considering switching everything to Cap'n Proto. However, since it is somewhat related to protocol buffers, I was wondering if Cap'n Proto too has a limitation wrt to the maximum filesize?


Solution

  • Cap'n Proto has a maximum file size of approximately 2^64 bytes, or 16 exbibytes -- which "should be enough for anyone". :)

    Cap'n Proto is in fact an excellent format for extremely large data files, because it supports random access and lazy loading. When reading a huge Cap'n Proto file, I recommend using mmap() to map the file into memory, then passing the bytes directly to the Cap'n Proto implementation (e.g. capnp::FlatArrayMessageReader in C++). This way, only the pages of the file that you actually use will be brought into memory by the operating system. (In contrast, with Protocol Buffers, it is necessary to parse the entire file upfront into in-memory data structures before you can access any of it.)

    Note that an individual List value in a Cap'n Proto structure has a limit of 2^29-1 elements. Text and Data (strings and byte blobs) are special kinds of lists, so this implies that any single contiguous text or byte blob is limited to 512MB. However, you can have multiple such blobs, so larger data can be stored into a single file by splitting it into pieces.

    Note also that most Cap'n Proto implementations by default impose a "traversal limit" when reading a Cap'n Proto structure in order to defend against malicious data containing pointer loops. Typically this defaults to 64MiB. For larger data, you'll want to override the limit -- in C++, you'll want to pass a custom ReaderOptions to the MessageReader constructor.