Search code examples
indexingdatabase-designrelational-databasesurrogate-keynatural-key

What are the type of data problems that relational databases have when they are designed using natural keys (and not surrogate keys)?


I saw this comment:

[applications] with the most data-related problems were those using natural keys.

Source: Surrogate vs. natural/business keys

I want more supporting evidence of this, as the comment left much to imagination.

It suggests that practice of using natural keys creates data-related problems, but does not specify what goes wrong... does data get corrupt? out of sync? becomes erroneous, lost, damaged? hard to query?

What are the data problems that happen when database is designed with natural keys opposed to using surrogate keys? How can those type of problems can be prevented when using surrogate keys?


Solution

  • The main issue with natural keys is how it affects related tables. If you change the value of the key, then you must correct every row in every table that references the original value.

    For example, suppose you have a Zip Code or Postal Code table. Quite often, these are designed where the Postal Code also serves as the natural key. Now suppose the Post Office changes a particular Postal Code (92680 becomes 92780). When you change the key in the Postal Code table, you must then go to every table that references that Postal Code, and update it there as well. So every row in the customer address, vendor address, etc... that has 92680 in the Postal Code has to be changed to 92780.

    Obviously, if the related tables are not remediated, you can start to have big problems. Let's say you charge insurance premiums based upon postal code. Imagine the issues you could have if these are not fixed in the premium table.

    Using Surrogate keys eliminates this problem altogether. You simply change the postal code in the Postal Code table. The surrogate key is not changed. And since the related tables store the surrogate key, you don't have to change anything else.