sql sql-server sql-server-2008 indexing non-clustered-index

How to calculate the Num_Key_Cols for a Non-Clustered Index with included columns in it?

I'm trying to estimate the index size using the information provided at the MSDN website.

Let us consider a table "Table1" with three columns in it. The columns are listed below,

Id int, not null
Marks int, not null
SubmitDate Date, not null

Initially, I have created a clustered primary key on Id column and then planned to create a non-clustered index with "Id" as "index key column" where as, "Marks" and "SubmitDate" columns will be used as "Included Columns" in the index.

Based on the above plan, I was trying to estimate the Non-Clustered index key size before creating it. While going through the MSDN site , I have lot of confusions to be clarified. There are four steps to estimate the nonclustered index key size and at the first step, 1.2 and 1.3 explains about, how to calculate the Num_Key_Cols, Fixed_Key_Size, Num_Variable_Key_Cols and Max_Var_Key_Size. But at 1.2 and 1.3, we should calculate based on the Index key type, do we have clustered index key already or not and do we have included columns or not; it seems to be confusing. Can anybody help me based on the info that I have provided about the sample table (Table1) and the Non-Clustered Index Key structure that I would like to create.

In my case, I have included columns, index key column was already a primary key, and all the columns are not null fields. how to calculate the index size for it?

Thanks in advance.

Solution

Initially, I have created a clustered primary key on Id column and then planned to create a non-clustered index with "Id" as "index key column" where as, "Marks" and "SubmitDate" columns will be used as "Included Columns" in the index.

I wouldn't do this. By making Id both the primary key (i.e. unique) and the clustering key of the table (with only 3 columns), there is little point adding a further NC Index on Id - the Clustered index is adequate.

If you had a large number of (on page) columns in the table then there could be a reason to add another NC Index on Id with included columns on (Marks, SubmitDate) as the NC index density would be higher. But this is clearly not the case here.

Some clarification of the MSDN NC sizing link:

Direct index columns need to be considered in all levels of the tree.
INCLUDE columns are present only in the leaf nodes of the tree
Num_Key_Cols = Num_Key_Cols + 1 is only needed when Sql Server adds a 4 byte uniquifier if the Clustering key isn't unique. You've clustered by the Primary Key, so its unique, so not +1.

Remember to also do empirical measurements on real data

Not all tables and indexes will have fixed column widths - many have variable length index columns like (N)VARCHAR, in which case the actual storage consumed by the table and indices is highly dependent on the average lengths of such fields.

In this case, I would suggest creating the table and populating it with approximate data and then measure the actual data storage with tools like sp_spaceused and sys.dm_db_index_physical_stats, e.g.:

select * from sys.dm_db_index_physical_stats (DB_ID(), 
           OBJECT_ID(N'dbo.Table1'), NULL, NULL , 'DETAILED');

(Page size is 8192)

Edit

Just to illustrate, if you already have this in place:

CREATE TABLE Table1
(
    Id INT IDENTITY(1,1),
    Marks INT NOT NULL,
    SubmitDate DATE NOT NULL
);

ALTER TABLE Table1 ADD CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED (Id);

Then there is no point in doing this as well:

CREATE NONCLUSTERED INDEX IX_Table1 on Table1(Id) INCLUDE (Marks, SubmitDate)

Since the Clustered Index will already seek / scan on ID just as performantly as the NC index - you will just double the storage requirement.

As an aside, also note that in general that the clustered index key is included in all non-clustered indexes automatically, and does not need to be explicitly added to Non clustered indexes.