Search code examples
data-structuresred-black-tree2-3-4-tree

Real-World Performance of Red-Black vs. 2-3-4 trees, especially considering cache performance?


A single node of a 2-3-4 tree could be constructed with 8 pointers: pointers to up to four child nodes, pointers to up to 3 actual records containing keys that will either match a search key or will determine which of 4 child nodes to recurse to, and a parent node pointer.

Common hardware today has 8-byte pointers, giving a 64-byte node. Further, modern CPUs have 64-byte cache lines. Should the nodes be aligned with the cache lines, then each node requires only one cache line hit: after referring to the first of seven pointers, all the rest will be in your L1 cache.

While a red-black tree is far simpler to implement, and small code should be fast code, each level of descent in the tree risks an L1 cache miss. For 1023 objects, a 2-3-4 tree needs a a worst-case of 5 nodes to be loaded into cache. A perfectly-balanced binary tree would need 10, but due to imbalance a Red-Black may need more (not sure the worst case: 20?)

Small test harnesses that simply hammer at one data structure will probably keep it all in cache, and so may report the Red-Black tree as being similar performance to the 2-3-4. But I have a feeling that a complicated real-world application may see much less wall-clock time and lower latency with 2-3-4 trees.

Is there any consensus or research on this?


Solution

  • Your reasoning is certainly correct -- for cold lookups the 2-3-4 tree will perform better just because it hits fewer cache lines.

    If the performance of the tree is important, though, that generally means that you're using it often.

    If the tree is being using often and it's not pretty much all in the cache, then it must be big. When a big tree is used often, the higher-level nodes will generally be cached, because each level up is hit twice as often as the level below on average.

    So the real performance improvement when it matters is limited to the deepest few levels in the tree. You can still see a performance with with a 2-3-4 tree, but it's not a runaway, and I think you'd need a special reason to judge it worth the extra code complexity (especially in search and iteration).