Search code examples
key-valuekey-value-storelmdb

Implementing persistent associative array on top of LMDB


I am new to key-value stores. I would like to try LMDB as persistent associative array and want to be able to use long length keys such as file paths or URLs.

LMDB defines compile-time constant MDB_MAXKEYSIZE=511 imposing a maximum for key length.

What technique should I use to implement persistent long-length key dictionary on top of LMDB? Some kind of hashing and collision resolution? Or recompile with different MAXKEYSIZE, for example 2048? Or LMDB is a wrong tool for this job?


Solution

  • In my own use, I handled a similar case by hashing the arbitrary-length string keys through SHA-256 to produce a fixed-size 32-byte digest key for the B+ tree. This strategy entails the following two caveats:

    1. You do altogether lose the ability to perform prefix string lookups. This isn't usually a significant problem for a use case such as the one you describe.

    2. In case you need to be able to enumerate all the original string keys, you'll need to store the original strings in another LMDB sub-database such that you can perform a lookup based on a SHA-256 digest to retrieve its corresponding original string--which will have been stored as the value of a key/value pair in the sub-database, thus not subject to MDB_MAXKEYSIZE limits. This does increase space requirements, adds an additional tree lookup for every cursor operation in a full scan, and will not yield the strings in lexicographical order. If you can tolerate those limitations, this is a workable approach.