I am using a fact table with the following structure in SQL Server 2012:
CREATE TABLE [dbo].[factTable] (
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[Date] DATE NOT NULL,
[MinuteNumber] SMALLINT NOT NULL,
[CityId] INT NOT NULL, /* Foreign key to dimCity */
[Value] DECIMAL(12, 4) NULL
)
I have a clustered index on the Date
column with a fill factor of 100. The data inserted into this table is almost always in the ascending order of Date
and MinuteNumber
.
I want to know - if having the Id column is necessary in the given scenario? Does it have any performance implications? Or can I safely eliminate it.
I also want to know if having clustered index on Date
column is sufficient (there will be many records with the same date, even same date and same minute-number) or is it better to have a clustered index combining multiple columns; and what are the performance and storage implications for either approach?
I am new to this and any help will be highly appreciated.
In your case, I'd probably create a nonclustered primary key on the identity column, to allow for easier FK relationship management and for performance.
The clustered key would be on the date
column, to allow for faster range queries. The date
column also fulfills the three basic requirements for a clustered index: it's narrow (to make nonclustered indexes smaller), it's stable (because a change on a CI column means reshuffling the NC indexes as well, this is to be avoided) and it's increasing (to avoid bad page splits, the ones not at the end of the table).
WRT non-unique clustered index, SQL Server will add a uniquifier data to it if it's not unique.