Search code examples
data-structurestreeb-tree

The order of b-trees


I'm studying for an exam, and came up on B-trees. Wikipedia describes a B-tree as a tree where the nodes have at least d and at most 2d keys and therefore at most 2d+1 leafs. For example if d=1, it would have a maximum of 2 keys, and 3 children, making it a 2-3 tree. However this wouldn't allow for example a 2-3-4 tree unless I'm mistaken.

However our material describes a b-tree as a tree where each node has at least t>=2 t-1 keys and at most 2t-1 keys. This would mean that the nodes have an odd number of keys and an even number of children. For example t=2 would have from 1 to 3 keys, and up to 4 children, making it a 2-3-4 tree. On the other hand there couldn't be a 2-3 tree with this notation.

On top of this, there is a notation by Knuth where the d would mean the maximum number of children in a node. This notation would allow both even and odd number of children, allowing both 2-3 trees and 2-3-4 trees.

I know both 2-3 trees and 2-3-4 trees exist.

What is the real notation? Is there a real notation? As an extra question; what is the maximun number of keys in a tree of size h?


Solution

  • If you read the wiki article a little more closely you'll see that there are two main variants of B-trees (excluding structural variants like B* and B+), one with t -> 2t keys, and one with t -> 2t+1 keys.

    Translating those variants to #children we have one with t+1 -> 2t+1 children, and one with t+1 -> 2t+2 children.

    So essentially to answer your question, both 2-3 and 2-3-4 trees are valid trees, each according to a different variant/definition:

    2-3 is of the first kind (t+1 -> 2t+1 children where t=1)

    2-3-4 is of the second kind (t+1 -> 2t+2 children where t=1)

    The validity of both variants stems from the fact that both splits and merges (actions done on delete and insert from/into the ADT) are valid:

    t -> 2t:

    Split. Happens when we add a new element and a node has more than the max number of keys 2t So we have 2t+1 keys, we split the node into two nodes, and push one element to the parent, leaving 2t keys in the two children, and t keys in each child. This is acceptable because the minimum number of keys in a node is indeed t.

    Merge. Happens when we delete a key and a node contains less than the minimum number, t, and it's sibling is also at the minimum. So we have t-1 + t keys in our new merged node, the resulting node must be valid: t-1 + t = 2t-1 <= 2t. Great.

    So too with t -> 2t+1:

    Split. 2t+2 becomes t and t+1 which is OK.

    Merge. t-1 + t = 2t-1 <= 2t+1

    Of course there is no difference in running times, it's just a slight implementation difference of little theoretical importance (you can slightly optimise some algorithms with the first variant, but not so much that it will change run-time complexities).