Search code examples
algorithmb-tree

Sequentially Constructing Full B-Trees


If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).

The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.


Solution

  • Constructing a "B+ tree" to those specifications is simple.

    1. Choose your branching factor k.
    2. Write the sorted data to a file. This is the leaf level.
    3. To construct the next highest level, scan the current level and write out every kth item.
    4. Stop when the current level has k items or fewer.

    Example with k = 2:

    0 1|2 3|4 5|6 7|8 9
    0   2  |4   6  |8
    0       4      |8
    0               8
    

    Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:

    0       4
    

    Now 4:

            4   6
    

    Now 4 again:

            4 5
    

    Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.