Search code examples
sql-serverperformancetimeclustered-index

Performance issue with clustered index for changing date column in SQL Server


We are using SQL Server 2008 Enterprise version. We have a large table FooTable (billions of rows).

FooTable Columns: site:varchar(7), device:varchar(7), time(datetime), value(float)

Every day we insert millions of new rows.

We created a clustered index for site, device and time (in order).

As we can see, site and device are relatively constant, but time will keep changing as time goes by.

The queries executed against this table would be:

  1. INSERT INTO FooTable SELECT * FROM #BULK_INSERTED_TEMP_TABLE

  2. SELECT value FROM FooTable WHERE site = 'fooSite' AND device = 'fooDevice' AND time = 'fooTime'

  3. SELECT SUM(value) FROM FooTable WHERE site = 'fooSite' AND device = 'fooDevice' AND time > 'startTime' AND time <= 'endTime'

What is the best clustered index design?


Solution

  • There's no one true answer for the best clustered index design. In general, I look at clustered indexes two ways. First, they store the data, so you need to consider them from the data storage aspect. Are you creating a cluster that is likely to be constantly splitting pages as new data arrives? Second, because they store data, you should consider the queries that are going to be used most frequently to retrieve the data. Will those queries be able to use the clustered index to get at the data?

    Knowing next to nothing about your set up, do you have an optimal choice for clustered index? I would say possibly not. What you've defined is a valid primary key candidate, but the structure you've outlined, with the two columns that are going to group the data into a particular structure combined with an ever increasing piece of data which will cause insertions all over the place within the distribution of the first two columns suggests you're going to be looking at lots of page splits. That may or may not be an issue, but it's something you'll need to monitor.