Search code examples
sql-servert-sqlcompound-key

In a junction table, should I use a Primary key and a unique constraint, or a compound/composite key?


I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.

If I have:

dbo.Book
--------
BookID PK int identity(1,1)

dbo.Author
----------
AuthorID PK int identity(1,1)

Now I have two choices for a simple junction table:

dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK

The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.

OR

dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK

Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.

I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.

It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.

What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.


Solution

  • As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.

    Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.

    When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.