I'm doing an online course and it gives the following relvar example and then says:
Hmm. Although answer and answercode are different in each tuple—making each tuple technically unique—the values for attributes id, questionID, and answertype are exactly the same. Normalization is in orderd.
Question is, what is considered too much redundancy? Where is the line drawn before we start going through normalization steps?
Normalization isn't something used to reduce redundant values. It's used to reduce redundant associations between values. In other words, it's meant to ensure that every fact is recorded only once.
To determine whether you need to normalize, write out the functional dependencies and look for causes of anomalies like transitive dependencies, dependencies on part of a key, etc. Each of these types of redundancies are clearly defined in the descriptions of normal forms online and in text books.
In your example, if I assume that questionID -> answertype
, (questionID, answercode) -> answer
, and that id
is equivalent to questionID
and can be ignored, then the relation is only in 1NF since answertype
is a non-prime attribute dependent on part of the key (questionID, answercode)
. If an update were to change the value of some of the shown answertypes (but not all), your data would be logically corrupt. Normalization could have prevented this.
Your clients care about accuracy and consistency. Do you?