If a table has 'id' (the primary key) column as its clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index in Microsoft SQLServer?
eg:- Table 'xyz'
id | name | status | date |
---|---|---|---|
1 | abc | active | 2021-06-23 |
CREATE NONCLUSTERED INDEX [NonClusteredIndex_status_Date]
ON [xyz]
(
[status] ASC,
[date] ASC
)
INCLUDE
( [id],
[name]
)
And this non-clustered index is targeted for a query similar to bellow on a large data set. In the actual case there could be some other queries as well.
select * from xyz where status='active' and date > '2021-06-20'
This answer assumes you are planning to run the following query:
SELECT * FROM xyz WHERE status = 'active' AND date > '2021-06-20';
If you only created a non clustered index on (status, date)
, then it would cover the WHERE
clause, but not the SELECT
clause. What this means is that SQL Server might choose to use the index to find the matching records in the query. But when it gets to evaluating the SELECT
clause, it would be forced to seek back to the clustered index to find the values for the columns not included in the index, other than the id
clustered index column (which includes the name
column in this case). There is performance penalty in doing this, and SQL Server might, depending on your data, even choose to not use the index because it does not completely cover the query.
To mitigate this, you can define the index in your question, where you include the name
value in the leaf nodes. Note that id
will be included by default in the index, so we do not need to explicitly INCLUDE
it. By following this approach, the index itself is said to completely cover the query, meaning that SQL Server may use the index for the entire query plan. This can lead to fast performance in many cases.