Search code examples
b-treedatabase-indexesdisk-io

Why can't a disk block store multiple index tree nodes


Recently, I was studying the implementation of index tree in database, and learned that using B+tree can store as many keys as possible in a disk block, so that the search process can read as few disks as possible.

But I have a question, why can't a disk block store multiple index tree nodes? The pointer of each node can be designed as blockNumber + offset.


Solution

  • A disk block certainly can store multiple index tree nodes. All you need to do is define whatever block size you want, and the file system maps it to the correct physical block.

    In general you don't want to do this, but mainly for performance reasons. Over time the tree nodes tend to get scattered all over the place, and then loading them back in ends up loading a full physical block. The rest of the nodes in the block might not be needed, but you paid the cost of loading more than you needed. In order to amortize the cost, its a good idea to match the node size to the block size. This helps improve the cost of performing index scans, since you'll end up reading the whole node anyhow.

    If the index is being accessed in a purely random fashion, and the entries are relatively small, then it doesn't matter if the node size is smaller than the physical block size. But databases aim to be general purpose, and they don't really know in advance how an index will end up being used. Matching the node size to the physical block size is a safe option.