Search code examples
lmdbokvs

What is special about internal design of LMDB?


What would be the performance difference (reads/writes) between some C++ implementation of in-memory B-Tree (for example google btree) and the LMDB (without taking into consideration all the feacures of LMDB such as transactions, isolation, shared access etc.)?


Solution

  • This 2014 lmdb design presentation by its architect Howard Chu covers the design and tradeoffs of lmdb.

    To summarize: lmdb is a copy-on-write, bottom-up updated, double-buffered, b-tree where the implementation always favors simplicity whenever it clashes with other considerations.

    The smart design choices make it one of the highest performance and corruption-resistant B-tree implementations out there.

    • copy-on-write means that data is never overwritten avoiding many possible corruption scenarios
    • bottom-up updates from leaf to root make the root update equivalent to a commit
    • double buffering of past versions keeps only the last-two roots in the db file
    • complex multi-level caching schemes are avoided - lmdb relies on the underlying OS for caching
    • The whole code base is very small compared to other DBs avoiding CPU cache misses

    Obviously, these choices mean that lmdb is not friendly to complex scenarios such as:

    • multi-version DB rollbacks (only last 2 are available)
    • long-lived transactions and delayed commits: these lead to append-only behavior and potentially unlimited growth of the db file
    • multiple concurrent writers: lmdb favors simpler multiple readers and single writer schemes

    There's much more in the full (over 100 pages) presentation. The above is just a summary of the spirit of lmdb.

    lmdb is used as the core storage engine in prominent open source projects such as Open LDAP and Memcached and in both cases speed-ups of orders of magnitude have been observed compared to alternatives as can be seen in micro-benchmark results.