Search code examples
c++rocksdb

Is RocksDB a good choice for storing homogeneous objects?


I'm looking for an embeddable data storage engine in C++. RocksDB is a key-value store.

My data is very homogeneous. I have a modest number of types (on the order of 20), and I store many instances (on the order of 1 million) of those types.

I imagine that the homogeneity of my data makes RocksDB a poor choice. If I serialise each object individually, surely I'm duplicating the schema metadata? And surely that will result in poor performance?

So my question: Is RocksDB a good choice for storing homogeneous objects? If so, how does one avoid the performance implications of duplicating schema metadata?


Solution

  • As I understand, RocksDB is really a KeyValue store and not a database at all. This means you only get the facility to store binary key and value data. Unlike a normal database (e.g. MySQL, SQLite) you don't get tables where you can define the columns/types etc..

    Therefore it is your program which determines how the data would be stored.

    One possibility is to store your data as JSON values, in which case as you say you pay the cost of storing the "schema" (i.e. the JSON field names) in the values.

    Another choice might be, you have a special key (for example) called SCHEMA that contains an AVRO schema of all your object types. Your app can read this on startup, initialise the readers/writers, and then it knows how to process each key+value stored in RocksDB.

    Yet another choice might be you hard-code the logic in your app. You could use any number of libraries for this, including AVRO (as mentioned above) or MsgPack and its variants. In this case you do need to be careful if you intend to use a RocksDB data from a previous version of the app, if you made any schema changes. So maybe store a version number or something in DB.