Search code examples
databasedatabase-normalization

Why does a database table expressing all possible combinations of values of 2 attributes violate 4NF?


Sometimes a database table is supposed to contain every possible combination of two (independent) things. I know such a table is subject to deletion anomalies of the kind that 4NF is supposed to prevent, and so I assume such a table must violate Fourth Normal Form (4NF).

Which specific dependencies applicable to such a table are in violation of 4NF?

Example: Suppose there's a company that supplies custom T-shirts. On their website, a customer chooses a Color ('Red', 'Green', 'Blue', 'Yellow', etc.) and a Picture ('Robot', 'Motorcycle', 'Dinosaur', 'Globe', etc.), the idea being that the company can print the chosen Picture onto a T-shirt of the chosen Color and send the resulting T-shirt out to the customer.

In real life of course some additional parameters would likely be available for selection, like Size, but let's keep things simple: only Color and Picture are available for selection, and they're totally independent of each other.

Suppose, as a database designer for this company, I specify a table called Available_Shirt_Options that has just the following columns: Color_ID, Picture_ID. This table is intended to capture every Color+Picture combination that a customer could possibly select. If 20 Colors exist and 1,000 Pictures exist, then you would expect the Available_Shirt_Options table to have 20,000 rows in it.

If the company wants to stop supplying Green t-shirts then by rights I should delete every row with the Color_ID that represents Green, and there's nothing to stop me from messing this up by deleting most of the Green rows but leaving a one of them undeleted, like the row for Green + Motorcycle.

But what's the technical reason why Available_Shirt_Options violates 4NF?

The only multivalued dependencies I see are Color_ID ->> Picture_ID and Picture_ID ->> Color_ID, which are dependencies on something that isn't a superkey but both these multivalued dependencies are "trivial" multivalued dependencies - i.e. in both cases, the dependency involves all of the columns of the table, that is, all of the attributes of the relation. 4NF places no restrictions on the trivial multivalued dependencies you're allowed to have, so those particular dependencies can't be the problem.

So, which dependencies are the problem?


Solution

  • TL;DR Your constraint is expressed by each of the non-trivial MVDs {}->>{Color_ID} and {}->>{Picture_ID}. They are not out of superkeys, so not all non-trivial MVDs that hold are out of superkeys, so the relation variable violates 4NF. (It's the natural join of its projections on {Color_ID} and {Picture_ID}.) The MVDs you give hold and are trivial, but they don't express your constraint, and you don't show, per a quoted MVD definition, that they do. You don't show, per quoted MVD & 4NF definitions, that all the non-trivial MVDs that hold are out of superkeys. 4NF is not intended to lack deletion or insertion anomalies nor all update anomalies, but you don't justify your expectations otherwise.


    Your constraint is expressed by each of the non-trivial MVDs {}->>{Color_ID} and {}->>{Picture_ID}. I.e. it is expressed by {}->>{Color_ID}|{Picture_ID}. I.e. the relation variable value is always the natural join of its projections on {}∪{Color_ID} and {}∪{Picture_ID}. I.e. it is expressed by the (binary) JD *{{Color_ID}, {Picture_ID}}. I.e. the relation variable value is always the natural join of its projections on {Color_ID} and {Picture_ID}.

    Those MVDs are not out of superkeys, so not all non-trivial MVDs that hold are out of superkeys, so the relation variable violates 4NF.

    Because the following multivalued dependencies hold - 1) Color_ID ->> Picture_ID, 2) Picture_ID ->> Color_ID - the Available_Shirt_Options table is clearly vulnerable to modification anomalies

    That is false. You don't justify it. Correct would be, because your constraint holds. But those MVDs don't express your constraint.

    Maybe you think X->>Y means NOT(X->Y). It doesn't. Using S for the set of attributes of relation variable R, X->>Y means R = (PROJECT X∪Y R) NATURAL JOIN (PROJECT X∪(S-Y) R).

    The MVDs you give, {Color_ID}->>{Picture_ID} and {Picture_ID}->>{Color_ID}, do hold, and they are trivial, but they don't express your constraint. {Color_ID}->>{Picture_ID} says what {Color_ID}->>{} and {Color_ID} ->> {}|{Picture_ID} and JD *{{Color_ID}, {Picture_ID, Color_ID}} all say, that the relation variable value is always the natural join of its projections on {Color_ID} and {Color_ID, Picture_ID}. Similarly {Picture_ID}->>{Color_ID} says what {Picture_ID}->>{} and {Picture_ID}->>{}|{Color_ID} and JD *{{Picture_ID}, {Picture_ID, Color_ID}} all say, that the relation variable value is always the natural join of its projections on {Color_ID} and {Picture_ID, Color_ID}.

    Anyway since your MVDs are trivial, so hold given only what the attributes are, they can't be expressing your constraint.

    it adheres to 4NF? Because although the multivalued dependencies Color_ID ->> Picture_ID and Picture_ID ->> Color_ID are dependencies on something that isn't a superkey, both these multivalued dependencies are "trivial"

    But: You don't characterize, per a quoted MVD definition, all the non-trivial MVDs that hold. You don't show, per a quoted 4NF definition, that all the non-trivial MVDs that hold are out of superkeys.

    so Available_Shirt_Options is in 4NF, right? Yet this seems counterintuitive and wrong, because you'd expect a table like Available_Shirt_Options that suffers from deletion anomalies, etc., to violate some normal form.

    But: You don't say why there should be no anomalies in 4NF. 4NF lacks update anomalies caused by MVDs, but it doesn't lack update anomalies caused by non-binary JDs (join dendencies). Update anomalies are first absent above 4NF at ETNF before 5NF. Moreover NFs in the sequence up to 6NF are only designed to lack update anomalies, not deletion & insertion anomalies.

    If you quoted definitions & tried to give a clear full argument for every claim via small clearly correct steps then you would face that you don't now soundly justify your claims and reasoning. Including that your two MVDs capture your constraint, that all non-trivial MVDs that hold are out of superkeys and that all anomalies or update anomalies should be absent. You might find that you can't justifiably claim them. You might see that they are wrong.

    (An MVD says what a certain binary JD says, that a relation is the natural join of a certain pair of its projections. It is much easier to reason in terms of an MVD's binary JD & its natural join reconstructing a relation from two projections than it is to reason in terms of the original MVD definition involving subtuples.)