Search code examples
databaserelational-databasedatabase-normalizationfunctional-dependencies3nf

How does "OR if Y is a prime attribute" eliminate redundancy for 3NF?


Assuming the schema R(A,B,C,D,E) with the functional depencies AB -> ABCDE (AB is the superkey) and C -> B (B is a part of the key AB, so the schema is in 3NF for all dependencies).

What does it help that B is part of the key (AB)?

Still it would be possible to have AB -> ... C ... -> B, e.g. (1,2,3,4,5) (2,2,3,4,5) where e.g. C is a department number and B is the number of people working in this apartment. It does not eliminate redundancy, even if C is a prime attribute. The same would happen if C would not be a prime attribute.

So how does this definition eliminate redundancy?

How does the definition part "X -> Y is in 3NF, if ... OR if Y is a prime attribute" (the other preconditions are clear) eliminate redundancy at all?

E.g. having (1,2,3,4,5) and (1,x,3,4,6): To satisfy C -> B, x should be 2, but if x would be 2, AB -> ABCDE wouldn't be satisfied, a contradiction.


Solution

  • The “or Y is a prime attribute” weakens the normal form, allowing more redundancy, not less (compared to BCNF, which does not have that “or” clause).

    How does the definition part […] ... OR if Y is a prime attribute" […] eliminate redundancy at all?

    It doesn't. What eliminates redundancy is the other disjunct (“the lhs of every non-trivial FD is a superkey”), which characterizes BCNF and which, as noted by philipxy, removes update anomalies due to FDs. Why that is the case is relatively simple to see: X -> Y asserts that if any two tuples agree on X, then they must agree on Y. So, if I take two tuples with the same values of X attributes and I hide the Y value of one of them, you can infer it by looking at the other tuple. This is the redundancy (information that can be derived from other parts of a relation). By requiring that X is a superkey, you eliminate the problem, because now there cannot be two tuples that agree on X.

    Also noted by philipxy is that 3NF was originally defined in terms of transitive (functional) dependencies, and the definition you mention is a (nice) equivalent characterization that was discovered later.

    In terms of transitive dependencies, you may informally understand the difference between BCNF and 3NF (hence the relevance of the clause you mention) in these terms: while BCNF forbids all transitive dependencies, 3NF forbids only some of them, specifically those involving attributes that do not belong to any key.

    You can always partition the attributes of any relation into two classes: prime attributes (attributes that belong to some key) and what I call “descriptors” (not a widely used terminology), that is, attributes that do not belong to any key. 3NF constrains only the functional dependencies having descriptors on the rhs; BNCF constrains all the functional dependencies.

    Using your example:

    A B C D E
    a0 b0 c0 d0 e0
    a1 ?? c0 d1 e1
    a2 ?? c0 d2 e2

    There is only one way to replace the ?? in the instance above in a way that satisfies your FDs: you can infer the value by looking at the first tuple. That is exactly because 3NF doesn't care that AB -> C and C -> B (and C does not functionally determines AB), because B is part of a key.