Search code examples
databasejoinnested-loops

why nr ∗ bs + br for block transfer in worst case Nested-Loop Join?


Considering the following join algorithm:

 For each r ∈ R do
    For each s ∈ S do
        if r.C = s.C then output r,s pair

nr is the number of tuples in R, br is the number of blocks in R

The textbook say that in worst case, the buffer can only hold one block and it takes nr ∗ bs + br times of transfer.

But I think, if the buffer can only hold one block,then every time after we finish loading the blocks in S, the block of R we loaded in before will be swapped out. Then if we want to visit the next tuple of R, shouldn't we load the block again even if we have loaded it before? ie I initially load the first tuple in R, in order to do so I need to load the first block in R; then I load the entire S, and compare their key. After I finish the inner loop, the first block of R has been swapped out, so when I try to load the second tuple in R, I need to load the first block again. Therefore I need load R every time when I try to access a tuple in R.

So why the worst case is not nr ∗ bs + nr?


Solution

  • In the worst case the database buffer can hold only one block of each relation. Hence let's assume that block is a block of R. Now for each tuple in R, we will check it for join condition with each tuple of S.

    Buffer has 1 block of R and 1 block of S.

    Total blocks of R = B(R)
    Total tuples in R = N(R)
    Total blocks of S = B(S)
    
    R X S is done.
    

    There are 2 spaces in the db buffer. 1st space can be filled B(R) no. of times. 2nd space will be filled N(R) X B(R) no. of times as each tuple in R will be matched with each tuple in S.

    Hence B(R) + N(R) X B(R) disk accesses will take place.