Search code examples
c++data-structurestreeb-tree

B+ tree with variable length keys


In a common implementation of a B+ tree, we can assume that keys have a fixed length (e.g. 25 bytes). We can then define that each node must have a minimum number of keys, and a maximum.

If I wanted the tree to accept variable length keys, what should I modify? What if I say that the node must have at least 2 keys, but the key which i'm trying to insert is so big that it doesn't fit into the Block that holds the node?


Solution

  • The simple solution is to store the keys as pointers (wrapped in a type that overrides the relative operators etc) rather than values, but that of course damages the locality that is part of the point of using B+ trees.

    That said, the larger the items, the less it matters that items are adjacent in memory. Huge items won't fit even one to a cache page, let alone several in the same page.

    Another relatively simple approach is to use a union type or placement new or whatever to allocate items within a memory-for-item type that's big enough for all item types you might use. You still have a fixed number of bytes per item, but the items don't necessarily use all those bytes.

    If you're willing to do the work, you could have variable-sized nodes. You'll have some hassles working with those nodes, of course, depending on how you arrange the in-node data structure to cope with that. You might have a small array of item-pointers within the node, for instance, pointing to the items which are also inside the node (not separately allocated on the heap).

    Also, every time you change a node you may need to reallocate it. Even if all you're doing is rebalancing, that might move a huge item from one node into another, and even though the destination node has room in the sense of having a slot free for an item it may not have enough bytes to store the value.

    In a sense, each node would be a mini-heap in which you can allocate or release space for items big or small, but sometimes you'd have to go back to the heap proper to replace that mini-heap with a bigger or smaller one.

    It's again worth mentioning that if the items are that huge, locality within a node probably isn't relevant anyway.

    I've implemented B+-style multiway trees in memory before myself, but I've never gone to this extreme.