Search code examples
relational-databaserelational-algebra

I don't understand the answers of this question with natural join and projection


I am following a MOOC, but I don't understand the correct answer nor the other answers.
The MOOC closed and I cannot ask any questions on the forum.

This is the question:

Considering the following relation R:
A B C D
1 0 2 2
4 1 2 2
6 0 6 3
7 1 2 3
1 0 6 1
1 1 2 1

Between all these requests, which one return the same relation R?

  1. ΠA,B,C,D(R⋈δA→D,D→F(R))
  2. R⋈δA→D,D→A(R)
  3. R⋈δB→C,C→B(R)
  4. ΠA,B,C,D(R⋈δB→G,C→F(R)) (note: this is the correct answer)

The only given explanation is :

The first 3 answers loose the tuple(4,1,2,2). In the last joint, no tuple is lost.

Could you details please whats does the answers do?
Thank you very much for your attention!


Solution

  • This is a question about the Relational Algebra's Natural Join, and attribute naming. I presume the squiggly thing in your formulas is for Rename, usually denoted by Greek letter rho ρ (see the wikipedia link).

    For Natural Join see the wikipedia example and note

    The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names.

    Because of the renaming in the four formulas, in general, the result from renamed R will not have the same attribute names as the original R, or will not be equal on the values in the resulting same-named attributes.

    I suggest you go through each four of the renamings, and work out what is the 'heading' of each result -- that is, what are the resulting attribute names.

    You'll find in requests 1., 2., 3. there's at least one resulting attribute same-named as the original R but the values for that attribute are not the same.

    In request 4., although attributes B, C are renamed, their new names do not clash with any existing attribute in R. So the Natural Join to original R will use attributes A, D. This'll produce an interesting intermediate result: consider the tuples <1, 0, 6, 1>, <1, 1, 2, 1> which each contain equal values in their A attribute and their D attribute.

    But then in request 4., the projection will throw away the newly-named attributes G, F and collapse back to the original A, B, C, D. So in general, request 4. always returns exactly the original R.

    Requests 1., 2., 3. might sometimes return the original R, depending on the content of R. But with the content you show, there are clashes of newly-same-named attributes with non-equal values, so they do 'lose' tuples.

    BTW, although tuple <4, 1, 2, 2> does indeed get 'lost' in those three requests, it's not the only tuple that gets 'lost'. In particular in request 3., note that for the sample data, there are no values in common between B, C, so swapping them round in the rename has the effect of returning an empty result from the Join.