Search code examples
ormprimary-keydatabase-agnosticdesign-decisions

Primary Keys - Native, Sequence, or GUID keys?


In reading this and this and then reading this (which references the other two ironically) I find myself wondering just how big the discussion of this topic is? I am a SQL Server guy and so I tend to use an Identity that is auto generated in the form of an int. However, when I know that I will need some form of replication between server and server or synchronization between client and server I tend to use a GUID as my key.

Question: Should I just use a GUID as my primary key across all tables at all times just in case I might need this possible scalability down the road? Does this make my schema more flexible in that it can be migrated between platforms at all times? Does this help me maintain the flexibility of my ORM (regardless of flavor) by not embedding platform specific features?

Responses:

@David Archer: Per your comment I updated my post to not say "Natural Key". Your are correct in that natural key is defined as such. Thanks for the correction.


Solution

  • I tend to prefer application-generated primary keys, typically using the lo/hi algorithm as implemented by NHibernate (when I'm using it on a project). Otherwise, sequential GUIDs work just as well. This isn't just my advice but rather of several folks who have been doing this whole development thing a lot longer than myself.

    The problem I see with using DB generated primary keys is that you have to hit the database to get those identity values versus having everything set up before you persist it to the database. It typically breaks the Unit of Work pattern in NHibernate as well due to this fact. If you're not using the UoW pattern in your app, then obviously this drawback doesn't apply.

    If you are using GUIDs for your PK, you definitely want to use sequential GUIDs to eliminate index fragmentation. This also gives you the "rough sort order" that another poster mentioned although I'd typically have a DateInserted column or similiar for those types of things.

    Joining on a GUID column has been shown to have a fairly minimal performance overhead versus your 4-byte integer and I'd venture to say that for non-large datasets, the performance difference is trivial.

    Natural keys are the spawn of the devil. :)