Search code examples
cdatabaseperformancedatabase-performancelmdb

Why does LMDB perform better on keys which already have previous data?


I've been working on a system using LMDB and I've stumbled across some weird characteristics that I don't understand, so I'm hoping you can help me.

I've found that if I were to have some data; any data; even smaller sized data; previously written as key's value initially, the database will be faster and have less variance to its write performance than if I were to just write a new key and value at the same time.

This doesn't make to much sense to me at the moment. I would have thought if the key initialised with data that isn't of the same size there would be no effect. Even if there were some advantage to having space reserved for the real value for the key, it wouldn't be enough. Apparently not though.

Is there any reason why this might be the case? Is there also any other databases which exhibit this type of behaviour?

Thank you in advance of your help, Michael


Solution

  • if I were to have some data; any data; even smaller sized data; previously written as key's value initially, the database will be faster and have less variance to its write performance

    There are many, many factors that affect perceived database performance. But, to focus on the effect you're observing, consider how LMDB manages its store. You could look at the code, of course, but oftentimes in database implementations, it helps just to consider what you would do if you were confronted with the implementor's problem.

    You seem to be assuming that every LMDB record is exactly as big as the sum of its key & data sizes. If that were so, then smaller sized existing data would force LMDB to allocate a new ... something to hold a larger-sized updated value. And LMDB would have to manage its storage according to the exact size of each record.

    To avoid those problems, ISTR LDMB uses a few memory "bins" of fixed sizes. A new record gets its allocation from the bin whose size accommodates the data. Perhaps your tests happen to be of similar enough size to fit in the same bin, so all you're doing is overwriting the same storage.