Search code examples
nosqlriakoverhead

What is the Riak per-key overhead using the Bitcask backend?


It's a simple question with apparently a multitude of answers.

Findings have ranged anywhere from:

a. 22 bytes as per Basho's documentation: http://docs.basho.com/riak/latest/references/appendices/Bitcask-Capacity-Planning/

b. 450~ bytes over here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-August/005178.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004292.html

c. And anecdotal records that state overheads anywhere in the range of 45 to 200 bytes.

Why isn't there a straight answer to this? I understand it's an intricate problem - one of the mailing list entries above makes it clear! - but is even coming up with a consistent ballpark so difficult? Why isn't Basho's documentation clear about this?


I have another set of problems related to how I am to structure my logic based on the key overhead (storing lots of small values versus "collecting" them in larger structures), but I guess that's another question.


Solution

  • The static overhead is stated on our capacity planner as 22 bytes because that's the size of the C struct. As noted on that page, the capacity planner is simply providing a rough estimate for sizing.

    The old post on the mailing list by Nico you link to is probably the best complete accounting of bitcask internals you will find and is accurate. Figuring in the 8bytes for a pointer to the entry and the 13bytes of erlang overhead on the bucket/key pair you arrive at 43 bytes on a 64 bit system.

    As for there not being a straight answer ... actually asking us (via email, the mailing list, IRC, carrier pigeon, etc) will always produce an actual answer.