Search code examples
databaseindexinglinked-listb-tree

Database Indexes B-Trees and Lists


Can anyone explain why databases tend to use b-tree indexes rather than a linked list of ordered elements.

My thinking is this: On a B+ Tree (used by most databases), the none-leaf nodes are a collection of pointers to other nodes. Each collection (node) is a ordered list. The leaf nodes, which is where all the data pointers are, is a linked list of clusters of data pointers.

The non-leaf nodes are just used to find the correct leaf node in which your target data pointer lives. So as the leaf nodes are just like a linked list, then why not just do away with the tree elements and just have the linked list. Meta data can be provided which gives the minimum and maximum value of each leaf node cluster, so the application can just read the meta data and find the correct leaf where the data pointer lives.

Just to be clear that the most efficent algorithm for searching an random accessed ordered list is an binary search which has a performance of O(log n) which is the same as a b-tree. The benifit of using a linked list rather than a tree is that they don't need to be ballanced.

Is this structure feasible.


Solution

  • After some research and paper reading I found the answer.

    In order to cope with large amounts of data such a millions of records, indexes have to be organised into clusters. A cluster is a continuous group of sectors on a disk that can be read into memory quickly. These are usually about 4096 bytes long.

    Each one of these clusters can contain a bunch of indexes which can point to other clusters or data on a disk. So if we had a linked list index, each element of the index would be made up of the collection of indexes contained in a single cluster (say 100).

    So, when we are looking for a specific record, how do we know which cluster it is on. We perform a binary search to find the cluster in question [O(log n)].

    However, to do a binary search we need to know where the range of values in each clusters, so we need meta-data that says the min and max value of each cluster and where that cluster is. This is great. Except if each cluster can contain 100 indexes, and our meta data is also held on a single cluster (for speed) , then our meta data can only point to 100 clusters.

    What happens if we want more than 100 clusters. We have to have two meta-data indexes, each pointing to 100 clusters (10 000 records). Well that’s not enough. Lets add another meta-data cluster and we can now access 1 000 000 records. So how do we know which one of the three meta-data clusters we need to query in order to find our target data cluster. We could search one then the other, but that doesn’t scale. So I add another meta-meta-data cluster to indicate which one of the three meta-data clusters I should query to find the target data cluster. Now I have a tree!

    So that’s why databases use trees. It’s not the speed it’s the size of the indexes and the need to have indexes referencing other indexes. What I have described above is a B+Tree – child nodes contain references to other child nodes or leaf nodes, and leaf nodes contain references to data on disk.

    Phew!