If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).
The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.
Constructing a "B+ tree" to those specifications is simple.
Example with k = 2:
0 1|2 3|4 5|6 7|8 9
0 2 |4 6 |8
0 4 |8
0 8
Now let's look for 5
. Use binary search to find the last number less than or equal to 5
in the top level, or 0
. Look at the interval in the next lowest level corresponding to 0
:
0 4
Now 4
:
4 6
Now 4
again:
4 5
Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.