We are using SQL Server 2008 Enterprise version. We have a large table FooTable (billions of rows).
FooTable Columns: site:varchar(7), device:varchar(7), time(datetime), value(float)
Every day we insert millions of new rows.
We created a clustered index for site
, device
and time
(in order).
As we can see, site
and device
are relatively constant, but time
will keep changing as time goes by.
The queries executed against this table would be:
INSERT INTO FooTable SELECT * FROM #BULK_INSERTED_TEMP_TABLE
SELECT value FROM FooTable WHERE site = 'fooSite' AND device = 'fooDevice' AND time = 'fooTime'
SELECT SUM(value) FROM FooTable WHERE site = 'fooSite' AND device = 'fooDevice' AND time > 'startTime' AND time <= 'endTime'
What is the best clustered index design?
There's no one true answer for the best clustered index design. In general, I look at clustered indexes two ways. First, they store the data, so you need to consider them from the data storage aspect. Are you creating a cluster that is likely to be constantly splitting pages as new data arrives? Second, because they store data, you should consider the queries that are going to be used most frequently to retrieve the data. Will those queries be able to use the clustered index to get at the data?
Knowing next to nothing about your set up, do you have an optimal choice for clustered index? I would say possibly not. What you've defined is a valid primary key candidate, but the structure you've outlined, with the two columns that are going to group the data into a particular structure combined with an ever increasing piece of data which will cause insertions all over the place within the distribution of the first two columns suggests you're going to be looking at lots of page splits. That may or may not be an issue, but it's something you'll need to monitor.