Search code examples
algorithmdata-structurestreebinary-search-treeb-tree

is not the benefit of B-Tree lost when it is saved in File?


I was reading about B-Tree and it was interesting to know that it is specifically built for storing in secondary memory. But i am little puzzled with few points:

  1. If we save the B-Tree in secondary memory (via serialization in Java) is not the advantage of B-Tree lost ? because once the node is serialized we will not have access to reference to child nodes (as we get in primary memory). So what that means is, we will have to read all the nodes one by one (as no reference is available for child). And if we have to read all the nodes then whats the advantage of the tree ? i mean, in that case we are not using the binary search on the tree. Any thoughts ? B-Tree

Solution

  • When a B-Tree is used on disk, it is not read from file, deserialized, modified, and serialized, and written back.

    A B-Tree on disk is a disk-based data structure consisting of blocks of data, and those blocks are read and written one block at a time. Typically:

    • Each node in the B-Tree is a block of data (bytes). Blocks have fixed sizes.
    • Blocks are addressed by their position in the file, if a file is used, or by their sector address if B-Tree blocks are mapped directly to disk sectors.
    • A "pointer to a child node" is just a number that is the node's block address.
    • Blocks are large. Typically large enough to have 1000 children or more. That's because reading a block is expensive, but the cost doesn't depend much on the block size. By keeping blocks big enough so that there are only 3 or 4 levels in the whole tree, we minimize the number of reads or writes required to access any specific item.
    • Caching is usually used so that most accesses only need to touch the lowest level of the tree on disk.

    So to find an item in a B-Tree, you would read the root block (it will probably come out of cache), look through it to find the appropriate child block and read that (again probably out of cache), maybe do that again, finally read the appropriate leaf block and extract the data.