Search code examples
algorithmdata-structuresb-tree

Can the max and min height of b+ tree be the same value?


I have this information:

  1. 10 million records.
  2. Each record is 256 bytes.
  3. Each record has a 32-bytes index field.
  4. Our system’s disk block size is 8KB.
  5. Our system is 32-bits, so the pointers are 4-bytes. (use this for the links!).

I want to find the max and minimum height of this B+ tree. and when I did had the same value for the max and min value is that possible? if not can you point to the mistake?

The steps I took

DATA NODE Related Calculations

assume L as the data count in a Data Node (leaf node)

  • Node size = Data count * Record size
  • 8KB = L * 256 bytes
  • 8192 bytes = L * 256-bytes
  • L = 8192 / 256
  • L = 32

Remember: Data nodes can store up to 32 Records! So, the maximum data count for each data node is 32. This means, a data node can store at least 32/2=16 data at least/minimum!

So

  • Min data count is 16
  • Max data count is 32

INDEX NODE Related Calculations

  • Node size = (M – 1) * index/key + M * links (pointers)
  • 8KB = (M – 1) * 32 bytes + M * 4 bytes
  • 8192 bytes = (M – 1) * 32 + 4* M
  • 8192 = 36M – 32
  • 36M = 8224
  • M = 228

Our data nodes can store up to 32 records maximum and 16 records minimum. Our index nodes can store up to 227 indexes maximum, 113 indexes minimum.

Finding the minimum:

  • 10 million records = 32 records * leaf/data node count
  • Leaf node count = 10 million / 32
  • Leaf node count = 312500

So, to store 10 million records we need 312500 FULL data nodes!

For 312500 data nodes we need to have 312500 links from index nodes!


  • 312500 = link count * index nodes
  • 312500 = 228 * index nodes
  • index nodes = 1370
  • Since 1370 is bigger than 228. This means we need to have upper index nodes!

  • 1370 = 228 * upper index nodes
  • upper index nodes = 6 which is less than 228. This means we need a root node on top of these 6 index nodes!

So min height is 3:

enter image description here

Finding the maximum:

  • 10 million records = 16 records * leaf/data node count
  • Leaf node count = 10 million / 16
  • Leaf node count = 625000

So, to store 10 million records we need 625000 FULL data nodes!

For 625000 data nodes we need to have 625000 links from index nodes!


  • 625000 = link count * index nodes
  • 625000 = 228 * index nodes
  • index nodes = 2741

Since 2741 is bigger than 228. This means we need to have upper index nodes!


  • 2741 = 228 * upper index nodes

upper index nodes = 12 which is less than 228. This means we need a root node on top of these 12 index nodes!

Question: So max height is 3 also ?

enter image description here


Solution

  • Yes that is perfectly possible. A difference is clear in the number of children that the root will get, which is tiny in the first case, and a multiple of that in the second case.

    I should note that you had a few errors. The main error is that in the last part ("finding the maximum"), you did not reduce the use of the index nodes to half their capacity. You continued with 228, while you should have used 114 instead.

    Secondly, in the first case, you need to round the divisions upwards, as you will need an extra node to cover for the remainder of the division (with some redistribution of keys from a neighboring node). In the second case it is right to round downwards, meaning that you need to add the remainder of the division to a node (as it has room for it).

    These corrections do not influence the final conclusion that 3 levels are needed in both cases. An overview:

    Fill style Records / leaf Keys / index Data nodes Level 3 Level 2 Level 1
    compact 32 228 312500 1371 7 1
    sparse 16 114 625000 5482 49 1