Search code examples
database-designdependenciesdatabase-normalizationrelational-algebrafunctional-dependencies

Explain non additive propery using a Boyce Codd Normal Form decomposition example


I have read the definition and sort of get it - after a lossless join you should not lose any information. But why does this specific example go for the third decomposition solution? This example is from the Fundamentals of Database Systems, by Elmasri and Navathe.

We have a table called TEACH with columns Student, Teacher, and Course. It is in 3rd Normal Form and we are trying to make it into BCNF (Boyce Codd Normal Form).

TEACH (STUDENT, COURSE, INSTRUCTOR)

STUDENT  COURSE    INSTRUCTOR
Nathan   Database  Mark
Smith    Database  Navathe
Smith    Op Sys    Ammar
Smith    Theory    Schulman
Wallace  Database  Mark
Wallace  Op Sys    Ahamad
Wong     Database  Omienscki
Zelaya   Database  Navathe

The text says there are three possible decompositions for relation TEACH

1) {Student, Instructor} and {Student, Course}
2) {Course, Instructor} and {Course, Student}
3) {Instructor, Course} and {Instructor, Student}

According to the text, there is only two FUnctional Dependencies

1) {student, course} - > instructor

2) instructor -> course

According to the text, only solution 3 is valid as it will not generate spurious tuples, and thus therefore has non additive property.

Spurious tuples come from joining on non primary attributes or non foreign keys. Isn't Student a primary attribute? So why doesn't solution 1 work?

My understanding of why it is solution 3 is that we cannot retrieve information such as "I have an instructor, what course does the instructor teach?" from solutions 1 and 2. This would relate back to the functional dependency in the original table, that Instructor -> Course.


Solution

  • You do not seem to have grasped normalization. You need to learn and use precise definitions.

    Normalization replaces a relation by projections of it that (losslessly/nonadditively) join back to it. A binary decomposition removing a problematic FD (functional dependency) is lossless/nonadditive when it is on the columns of a superkey of one of the components. (A superkey is a superset of a CK (candidate key).) ("Joining on non primary attributes or non foreign keys" does not characterize lossy cases. Why do you think that?) Once you have the components, you must determine what their CKs/superkeys are.

    Presumably you mean that those two FDs form a cover for the original table. (Ie the only FDs that hold are the ones that hold because those ones hold.) Then the only non-trivial FD that holds in the components is Instructor -> Course, and {Instructor} is a CK of component {Instructor, Course}, and the only choice that is a join on a superkey of a component is 3).

    Student is a prime attribute of every component it appears in, which means that it is a member of a CK. But what matters to losslessness is whether a join is on a superkey, and {student} is not a superkey of any component.

    PS 1. It's not that "after a lossless join you should not lose any information". By definition a lossless join is when you don't lose any information. Similarly, "will not generate spurious tuples" means the same thing by definition as "has [the] non additive property".

    PS 2. Your "understanding" is not clearly explained. Suppose the original relation holds rows where "instructor INSTRUCTOR teaches course COURSE to student STUDENT". When FD Instructor -> Course holds in the original it can also be described as holding the rows where "instructor INSTRUCTOR teaches course COURSE AND instructor INSTRUCTOR teaches student STUDENT". Since the meaning of the JOIN of tables is the AND/conjunction of the meanings of the tables, we can reconstruct the table by joining the two tables for "instructor INSTRUCTOR teaches course COURSE" & "instructor INSTRUCTOR teaches student STUDENT" per 3). But that FD doesn't allow us to rephrase the original meaning with an AND per 1) & 2). So it's not appropriate to replace it by a JOIN in those cases.