Search code examples
databasealgorithmdata-structurestreeb-tree

Understanding Disk Reads and Cache Misses for B-tree and BST Queries


I am trying to understand how disk reads are handled when performing query operations on B-trees and have TWO questions.

I have the following B-tree structure:

            [10, 20]
         /     |      \
 [1, 5, 7] [15, 18] [25, 28, 30, 35]

Suppose I have a query: SELECT * FROM table WHERE id = 15.

From my understanding, the process involves:

  1. Start with the Root Node:
  • The root node [10, 20] is fetched from the disk to memory in a single read operation.
  • The processor compares the key 15 with the keys in the root node and determines that it should follow the second child pointer, leading to the node [15, 18].
  1. Fetch the Next Relevant Node:
  • A second read operation fetches the node [15, 18] from the disk to memory.
  • The processor then searches within this node to find the key 15.

So, the process involves:

  • First Disk Read: Fetch the root node [10, 20].
  • Processor Comparison: Determine which child node to access.
  • Second Disk Read: Fetch the relevant child node [ 15, 18].
  • Processor Comparison: Locate the key 15 within the node [15, 18].

My FISRT question is: Does this description accurately represent how disk reads and processor comparisons are handled during a query on a B-tree? Is there anything I am missing or misunderstanding about this process?

Secondly, I have read the following quote in another Stack Overflow post Advantage of B+ trees over BSTs:

"A BST touches fewer memory locations on lookups than B trees, but the cost of those accesses is high because each access likely costs a cache miss."

When converting the above B-tree to a BST, it would look something like this:

         10
       /    \
      5      20
     / \    /  \
    1   7  15   30
               /  \
              25   35
                 \
                 28

For a query SELECT * FROM table WHERE id = 15 in this BST, three nodes are accessed (10, 20, 15). My SECOND questions are:

  • What does "Fewer Memory Locations" mean in this context?
  • Why is the cost of accesses considered high due to cache misses?
  • With a B-tree, two nodes are accessed (root and child node), but the quote implies that BST touches fewer memory locations. How can three node accesses in a BST be considered fewer memory locations than two node accesses in a B-tree?

Thank you for your help! (Any detailed explanations or references to documentation would be greatly appreciated!)


Solution

  • Does this description accurately represent how disk reads and processor comparisons are handled during a query on a B-tree?

    Yes. Sure, there can be many nuances, like:

    • "processor comparison" would only be factual if the key is a primitive that is supported by the processor (an integer, a float). Anything more complex (like a string, an integer size that exceeds the CPU word size, ...etc) will need a higher-level comparison algorithm.
    • "disk read" will only happen if the block is not in memory (cache) already. If several subsequent lookups are performed, it might well be that the root block is still in memory and no disk read is involved.

    Concerning the claim that A BST touches fewer memory locations on lookups than B trees...": this is only true if the search-algorithm for B-trees will perform a linear search among the keys in a single node. This is usually the case.

    You ask:

    What does "Fewer Memory Locations" mean in this context?

    Each key and each pointer to a child represent a memory location. But we could limit the analysis to counting the memory locations that correspond to keys. And yes, in your example, the count for the B-tree search for 15, would access three such locations (10, 20, 15).

    Why is the cost of accesses considered high due to cache misses?

    Memory access can be very fast when its location is adjacent to the previous memory access (in forward direction). CPUs are optimised for this. Secondly, when disk reads are involved, although BST nodes could be read from disk in groups, we have no guarantee that the next key we want to read is in memory. Each such key retrieval (going to the left or right child in a BST) could involve a cach miss and a new read from disk. With B-trees this is also the case at the level of B-tree nodes, but there are fewer of them. Cache misses will not occur when reading more keys from the same B-tree node (which is typically what happens during a search), and once the first key from a B-tree node is read we can be 100% sure to have no cach misses for the next few keys we want to read in that same node. Moreover, those keys can be read from "left-to-right" at adjacent memory locations, which maximises efficient access. This is an advantage that a BST does not offer.

    With a B-tree, two nodes are accessed (root and child node), but the quote implies that BST touches fewer memory locations. How can three node accesses in a BST be considered fewer memory locations than two node accesses in a B-tree?

    It is not the node accesses that are counted, but the key accesses. In your example both access three keys. The B-tree search has the advantage that it has at most 2 cache misses, while the BST search could have 3 of them. But the example is too small to see a significant difference. The difference will be more telling when using a higher degree B-tree and with more levels. By consequence, examples would need to be quite large to see a difference.

    Proof of the claim

    The claim has two parts:

    1. A BST touches fewer memory locations on lookups than B trees,
    2. The cost of those [BST] accesses is high because each access likely costs a cache miss.

    In the second part, the qualifier "likely" is more true for the CPU level of memory access (as the involved memory will not be adjacent), while for disk reads, it would be just "possibly", not "likely".

    For the first part of the claim, let's generalise and say we have a B-tree with a maximum degree 𝑚, which means that the average number of keys in a node could be 𝑘=~0.75𝑚. If that tree has 𝑛 values, its expected height would then be log𝑘𝑛. The majority of the keys resides in the bottom layer of the B-tree, so to find a key in it, we would expect to read log𝑘𝑛 nodes from the B-tree, and in each we would make 𝑘/2 comparisons on average. So the total number of comparisons (involving reading a key from memory) is (𝑘/2)log𝑘𝑛 on average.

    Now if we store those 𝑛 values in a BST, we will have 𝑛 nodes, and an expected height of log2 (if it is balanced), and will visit log2𝑛 nodes for one comparison at each node.

    To compare these two numbers, we could take their ratio and see if it is less or greater than 1:

          ratio = (𝑘/2)log𝑘𝑛 / log2𝑛

    Let's convert the 𝑘 base logarithm to only 2-base logarithms:

          log𝑘𝑛 = log2𝑛 / log2𝑘

    So:

          ratio = (𝑘/2)log2𝑛 / (log2𝑘 log2𝑛)

    ...which simplifies to:

          ratio = (𝑘/2) / log2𝑘

    For 𝑘 > 2, this ratio is greater than 1, which means that we expect more comparisons in a B-tree when it has a maximum degree 𝑚 of 3 or more.