Search code examples
sqldatabasedatabase-normalization

Database normalization and repeated values in different rows


I think I was under the false impression that part of normalizing a data base is replacing duplicate values in rows with id numbers. Consider the following:

(employ table) 

id(pk   name    branch
-------------------------------
1       dan      nyc_union
2       ron      la_skidrow
3       seth     nyc_union

The same branch can appear more than once and there is only one nyc_union branch, so to avoid spelling mistakes (NYC, ny) and redundancy we create a second table for branches and link it with a relationship to the employe table.

But after reading the following https://www.studytonight.com/dbms/third-normal-form.php

If you look at the table example they have in the link above, to satisfy the second rule of normalization they use a table which repeats the value for branch without creating another table, and yet they claim it's normalized. I've seen other similar examples.

What if we had more than one branch per location would that change the answer? Meaning now we can just put this attribute in the employe tablre without creating another table?

In which cases do we have more then one column of real values, not links to ids?

IN CONCLUSION

As a rule of thumb to follow normalization we should use only one 'real non linked id value' per table and the rest of the values should be links to ids in other tables. Is this correct?


Solution

  • The key point about normalization is not to avoid duplicate values: several people can have the same name, but that is not a reason to break out a names table and reference it with a foreign key.

    Normalization comes into play where there is another column, for example

    id   name    branch       address
    ---------------------------------
    1    dan     nyc_union    1 Union Street
    2    ron     la_skidrow   22 Skid Row
    3    seth    nyc_union    1 Union Street
    

    In this case you have a “functional dependency”, namely the address of the branch depends on the branch. That is, for any two employees in the same branch, the address would be the same.

    To normalize this denormalized table, we'd move branch and address to its own table with branch as the primary key and change the employee table to the one in your question, where branch is a foreign key to our new branch table.

    This has two advantages:

    • It cannot happen that someone updates the address in one employee record, but forgets to perform the same update in all other employees of that branch (which would cause an inconsistency).

    • You don't have to store the same address over and over and save some space.