Search code examples
sql-servertimestampclustered-indexrowversion

SQL Server: Clustering by timestamp; pros/cons


I have a table in SQL Server, where i want inserts to be added to the end of the table (as opposed to a clustering key that would cause them to be inserted in the middle). This means I want the table clustered by some column that will constantly increase.

This could be achieved by clustering on a datetime column:

CREATE TABLE Things (
    ...
    CreatedDate datetime DEFAULT getdate(),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)

But I can't guaranteed that two Things won't have the same time. So my requirements can't really be achieved by a datetime column.

I could add a dummy identity int column, and cluster on that:

CREATE TABLE Things (
    ...
    RowID int IDENTITY(1,1),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)

But you'll notice that my table already constains a timestamp column; a column which is guaranteed to be a monotonically increasing. This is exactly the characteristic I want for a candidate cluster key.

So I cluster the table on the rowversion (aka timestamp) column:

CREATE TABLE Things (
    ...
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)

Rather than adding a dummy identity int column (RowID) to ensure an order, I use what I already have.

What I'm looking for are thoughts of why this is a bad idea; and what other ideas are better.

Note: Community wiki, since the answers are subjective.


Solution

  • So I cluster the table on the rowversion (aka timestamp) column: Rather than adding a dummy identity int column (RowID) to ensure an order, I use what I already have.

    That might sound like a good idea at first - but it's really almost the worst option you have. Why?

    The main requirements for a clustered key are (see Kim Tripp's blog post for more excellent details):

    • stable
    • narrow
    • unique
    • ever-increasing if possible

    Your rowversion violates the stable requirement, and that's probably the most important one. The rowversion of a row changes with each modification to the row - and since your clustering key is being added to each and every non-clustered index in the table, your server will be constantly updating loads of non-clustered indices and wasting a lot of time doing so.

    In the end, adding a dummy identity column probably is a much better alternative for your case. The second best choice would be the datetime column - but here, you do run the risk of SQL Server having to add "uniqueifiers" to your entries when duplicates occur - and with a 3.33ms accuracy, this could definitely be happening - not optimal, but definitely much better than the rowversion idea...