In Scaling Up Your Data Warehouse with SQL Server 2008 R2, the author recommends using an integer date key in the format of YYYYMMDD as a clustered index on your fact tables to help optimize query speed.
What is the best way to convert your key date field to the Date Key? I feel the following would work, but is a bit sloppy:
select Replace(CONVERT(varchar,GETDATE(),102),'.','')
Clearly, I'm not using getdate, but rather a date column in the table that will be using in my aggregations.
First, how would you suggest making this conversion? Is my idea acceptable?
Second, has anyone had much success using the Date Key as a clustered index?
ISO long (112) would do the trick:
SELECT CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112))
Casting getdate() straight to int with ISO 112 gives 41008 for some reason, but going via a VARCHAR seems to work - i'll update if i think of a faster cast.
EDIT: In regards to the int only vs varchar debate, here are my findings (repeatable on my test rig & production server) Varchar method uses less cpu time for half a million casts but a fraction slower overall - negligible unless your dealing with billions of rows
EDIT 2: Revised test case to clear cache and differnt dates
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT YEAR([Date]) * 10000 + MONTH([Date]) * 100 + DAY([Date])
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 255ms.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT CONVERT(INT, CONVERT(VARCHAR(8), [Date], 112))
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 602ms