Search code examples
algorithmdata-structurestreeb-tree

Is there a way to evaluate the number of leaves needed by a B+ tree?


If I have t = #tuples (or #records) and the only other info I can exploit is the fanout F of my B+ tree, is there a way to get how many leaves (#blocks for the leaves) the tree will need ? Assuming the tree must be balanced and so keeping the most information in the least space possible.

Example: t = 40K, F = 40 ==> depth = log_40(40K) = 3 , #leaves = ???


Solution

  • In a B+ tree the actual records are referenced from within the leaves. The internal nodes on the other hand don't reference records directly, but define ranges for key values in order to traverse through the tree to the leaf(block) that finally has the key values of the records of interest.

    Another property of B+ trees is that the number of children of an internal node not only has an upper bound (i.e. the fanout F) but also a lower bound: F/2. (I'm ignoring that the root is exempted from this lower-bound rule). This 1/2 coefficient is the fill factor (0.5).

    The number of leaf-blocks (L) is bound by the number of records (t) and the fanout (F), but not by the way the tree is (un)balanced. In the best, minimal case we have:

            min(L) = ⌈ t / F⌉

    In the worst, maximal case we have:

            max(L) = ⌈ 2t / F⌉

    If you have a different fill factor (c >= 0.5), then the worst case is:

            max(L) = ⌈ t / cF⌉

    Be aware that increasing the fill factor reduces used space, but increases time overhead when inserting records. When the fill factor would be close to 1, space usage is kept at the bare minimum, but updates will be slow.