Search code examples
sqlitegoembedded-databaserocksdb

Embedded key-value db vs. just storing one file per key?


I'm confused about the advantage of embedded key-value databases over the naive solution of just storing one file on disk per key. For example, databases like RocksDB, Badger, SQLite use fancy data structures like B+ trees and LSMs but seem to get roughly the same performance as this simple solution.

For example, Badger (which is the fastest Go embedded db) takes about 800 microseconds to write an entry. In comparison, creating a new file from scratch and writing some data to it takes 150 mics with no optimization.

EDIT: to clarify, here's the simple implementation of a key-value store I'm comparing with the state of the art embedded dbs. Just hash each key to a string filename, and store the associated value as a byte array at that filename. Reads and writes are ~150 mics each, which is faster than Badger for single operations and comparable for batched operations. Furthermore, the disk space is minimal, since we don't store any extra structure besides the actual values.

I must be missing something here, because the solutions people actually use are super fancy and optimized using things like bloom filters and B+ trees.


Solution

  • But Badger is not about writing "an" entry:

    My writes are really slow. Why?

    Are you creating a new transaction for every single key update? This will lead to very low throughput.

    To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call.
    You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

    That leads to issue 396:

    I was looking for fast storage in Go and so my first try was BoltDB. I need a lot of single-write transactions. Bolt was able to do about 240 rq/s.

    I just tested Badger and I got a crazy 10k rq/s. I am just baffled

    That is because:

    LSM tree has an advantage compared to B+ tree when it comes to writes.
    Also, values are stored separately in value log files so writes are much faster.

    You can read more about the design here.


    One of the main point (hard to replicate with simple read/write of files) is:

    Key-Value separation

    The major performance cost of LSM-trees is the compaction process. During compactions, multiple files are read into memory, sorted, and written back. Sorting is essential for efficient retrieval, for both key lookups and range iterations. With sorting, the key lookups would only require accessing at most one file per level (excluding level zero, where we’d need to check all the files). Iterations would result in sequential access to multiple files.

    Each file is of fixed size, to enhance caching. Values tend to be larger than keys. When you store values along with the keys, the amount of data that needs to be compacted grows significantly.

    In Badger, only a pointer to the value in the value log is stored alongside the key. Badger employs delta encoding for keys to reduce the effective size even further. Assuming 16 bytes per key and 16 bytes per value pointer, a single 64MB file can store two million key-value pairs.