I think I was under the false impression that part of normalizing a data base is replacing duplicate values in rows with id numbers. Consider the following:
(employ table)
id(pk name branch
-------------------------------
1 dan nyc_union
2 ron la_skidrow
3 seth nyc_union
The same branch can appear more than once and there is only one nyc_union branch, so to avoid spelling mistakes (NYC, ny) and redundancy we create a second table for branches and link it with a relationship to the employe table.
But after reading the following https://www.studytonight.com/dbms/third-normal-form.php
If you look at the table example they have in the link above, to satisfy the second rule of normalization they use a table which repeats the value for branch without creating another table, and yet they claim it's normalized. I've seen other similar examples.
What if we had more than one branch per location would that change the answer? Meaning now we can just put this attribute in the employe tablre without creating another table?
In which cases do we have more then one column of real values, not links to ids?
IN CONCLUSION
As a rule of thumb to follow normalization we should use only one 'real non linked id value' per table and the rest of the values should be links to ids in other tables. Is this correct?
The key point about normalization is not to avoid duplicate values: several people can have the same name, but that is not a reason to break out a names
table and reference it with a foreign key.
Normalization comes into play where there is another column, for example
id name branch address
---------------------------------
1 dan nyc_union 1 Union Street
2 ron la_skidrow 22 Skid Row
3 seth nyc_union 1 Union Street
In this case you have a “functional dependency”, namely the address of the branch depends on the branch. That is, for any two employees in the same branch, the address would be the same.
To normalize this denormalized table, we'd move branch
and address
to its own table with branch
as the primary key and change the employee table to the one in your question, where branch
is a foreign key to our new branch table.
This has two advantages:
It cannot happen that someone updates the address in one employee record, but forgets to perform the same update in all other employees of that branch (which would cause an inconsistency).
You don't have to store the same address over and over and save some space.