Search code examples
sqldatabase-designsurrogate-keynatural-key

Surrogate key 'preference' explanation


As I understand there is a war going on between purists of natural key and purists of surrogate key. In likes to this this post (there are more) people say 'natural key is bad for you, always use surrogate...

However, either I am stupid or blind but I can not see a reason to have surrogate key always!

Say you have 3 tables in configuration like this: Link table

Why would I need a surrogate key for it?? I mean it makes perfect sense not to have it.

Also, can someone please explain why primary keys should never change according to surrogate key purists? I mean, if I have say color_id VARCHAR(30) and a key is black, and I no longer need black because I am changing it to charcoal, why is it a bad idea to change black key to charcoal and all referencing columns too?

EDIT: Just noticed that I dont even need to change it! Just create new one, change referencing columns (same as I would have to do with surrogate key) and leave old one in peace....

In surrogate key mantra I need to then create additional entry with, say, id=232 and name=black. How does that benefit me really? I have a spare key in table which I don't need any more. Also I need to join to get a colour name while otherwise I can stay in one table and be merry?

Please explain like to a 5 year old, and please keep in mind that I am not trying to say 'surrogate key is bad', I am trying to understand why would someone say things like 'always use surrogate key!'.


Solution

  • Surrogate keys are useful where there is an suboptimal natural key: no more, no less. A suboptimal natural key would be a GUID or varchar or otherwise wide/non-ordered.

    However, the decision to use a surrogate is an implementation decision after the conceptual and logical modelling process, based on knowledge of how the chosen RDBMS works.

    However, this best practice of "have a surrogate key" is now "always have a surrogate key" and it introduced immediately. Object Relation Mappers also often add surrogate keys to all tables whether needed or not which doesn't help.

    For a link (many-many) table, you don't need one: SQL: Do you need an auto-incremental primary key for Many-Many tables?. For a table with 2 int columns, the overhead is an extra 50% of data for a surrogate column (assuming ints and ignoring row metadata)