sql database relational-database relational-model

Does every table really need an auto-incrementing artificial primary key?

Almost every table in every database I've seen in my 7 years of development experience has an auto-incrementing primary key. Why is this? If I have a table of U.S. states where each state where each state must have a unique name, what's the use of an auto-incrementing primary key? Why not just use the state name as the primary key? Seems to me like an excuse to allow duplicates disguised as unique rows.

This seems plainly obvious to me, but then again, no one else seems to be arriving at and acting on the same logical conclusion as me, so I must assume there's a good chance I'm wrong.

Is there any real, practical reason we need to use auto-incrementing keys?

Solution

No.

In most cases, having a surrogate INT IDENTITY key is an easy option: it can be guaranteed to be NOT NULL and 100% unique, something a lot of "natural" keys don't offer - names can change, so can SSN's and other items of information.

In the case of state abbreviations and names - if anything, I'd use the two-letter state abbreviation as a key.

A primary key must be:

unique (100% guaranteed! Not just "almost" unique)
NON NULL

A primary key should be:

stable if ever possible (not change - or at least not too frequently)

State two-letter codes definitely would offer this - that might be a candidate for a natural key. A key should also be small - an INT of 4 bytes is perfect, a two-letter CHAR(2) column just the same. I would not ever use a VARCHAR(100) field or something like that as a key - it's just too clunky, most likely will change all the time - not a good key candidate.

So while you don't have to have an auto-incrementing "artificial" (surrogate) primary key, it's often quite a good choice, since no naturally occuring data is really up to the task of being a primary key, and you want to avoid having huge primary keys with several columns - those are just too clunky and inefficient.