Does anyone know of a serialisation format that:
I have looked at the following:
Smile - doesn't support traversal
BSON - does support traversal! But the maximum document size is 2 GB.
BSON was so close but the maximum file size kills it for me. Are there any formats that would work? Obviously I can write my own, but there are sooooo many binary JSON formats, surely someone has made a decent one?
Edit: By "traversal" I mean the same thing that the BSON authors mean - you should be able to find a given object without having to parse the entire file. Amazon calls this "sparse" or "shallow" reading.
Found one! Amazon Ion. From the FAQ:
Many reads are shallow or sparse, meaning that the application is focused on only a subset of the values in the stream, and that it can quickly determine if full materialization of a value is required.
In the spirit of these principles, the Ion specification includes features that make Ion’s binary encoding more efficient to read than other schema-free formats. These features include length-prefixing of binary values and Ion’s use of symbol tables.
Brief notes on Ion:
It is not very popular. Libraries are available for only a few languages and I can't even find a command line tool that uses it. Still, it seems to be the only option if you want these features!
Edit:
In the end we went with SQLite which is pretty excellent. It doesn't really follow the JSON data model but it does let you do sparse reads very easily and it is very fast. Another possibility is DuckDB which is kind of a modern take on SQLite but less widely supported.