Search code examples
sqlsql-serverindexingquery-optimization

How Non-Clustered affects Clustered Index (PK)


Columns          | Data Type     | Allow Null
-----------------|---------------|-----------
Id (PK)          | int           | Unchecked
                 |               |  
Type             | tinyint       | Unchecked
                 |               |
Severity         | tinyint       | Unchecked
                 |               |
Message          | varchar(1000) | Unchecked
                 |               |
ReferenceTypeId  | int           | Checked
                 |               |
ReferenceId      | int           | Checked
                 |               |
ParentId         | int           | Checked
                 |               |
ParentTypeId     | int           | Checked
                 |               |
Created          | datetime      | Checked

Okay, in the above table I have 10996674 records. When i run the following Query select count(id) from DTL_Event it takes 1:23 mins.

After applying Indexing

CREATE NONCLUSTERED INDEX DTL_Event_Optimization_Date on DTL_Event([Created])
CREATE NONCLUSTERED INDEX DTL_Event_Optimization_TypeIds_Composite on DTL_Event([ReferenceTypeId],[ParentTypeId])
CREATE NONCLUSTERED INDEX DTL_Event_Optimization_Reference_Parent_Ids_Composite on DTL_Event([ReferenceId],[ParentId])

The same query select count(id) from DTL_Event takes 1 sec. I need to know why, because Id is a primary key which supposed to be indexed already and I am applying indexing on other columns thats why i have a Question:

Why NonClustered Indexing on other Columns optimizes this query select count(id) from DTL_Event. i.e. Why applying Nonclustered indexing on other columns helps me faster retrieval of count of Id(s)


Solution

  • After adding a nonclustered index QO is choosing one of non clustered clustered index simply because that is most efficient way to do because of "page count" in the index.

    QO might choose (most of the time) NON-clustered index with the lowest page count overall.

    As you might know nonclustered indexes has unique identifier for lookup in the other indexes/heap in the leaf level.

    Why NonClustered Indexing on other Columns optimizes this query select count(id) from DTL_Event. i.e. Why applying Nonclustered indexing on other columns helps me faster retrieval of count of Id(s)

    In your case that is you primary (clustered and unique) key ID, so every nonclustered has ID column in the leaf level for example you create a nonclustered index like

    >   CREATE NONCLUSTERED INDEX DTL_Event_Optimization_Date on
    > DTL_Event([Created])
    

    Then it will automatically include column ID

    >   CREATE NONCLUSTERED INDEX DTL_Event_Optimization_Date on
    > DTL_Event([Created]) 
    > INCLUDE (ID)
    

    So actually after adding nonclustered index its quite efficient to look on one of the nonclustered index for the rowcount instead of looking on to clustered index you can look into the same in execution plan

    TEST RIG :

    CREATE Table DTL_Event (ID INT not null , TYPE tinyint not null  , Severity tinyint  not null ,[Message] varchar(1000) not null , ReferenceTypeId int ,ReferenceId int
    
     ,ParentId int , ParentTypeId int ,Created datetime   , primary key (ID))
    

    Without Non-clustered indeX

      select count(id) from DTL_Event
    

    EXECUTION PLAN enter image description here

    WITH NONCLUSTERED INDEX

     CREATE NONCLUSTERED INDEX DTL_Event_Optimization_Date on DTL_Event([Created])
    
    
    
    select count(id) from DTL_Event
    

    EXECUTION PLAN enter image description here

    BUT if your goal is to count total number of rows present in the table best way is to do that

    SELECT SUM (row_count)
    FROM sys.dm_db_partition_stats
    WHERE object_id=OBJECT_ID('Transactions')   
    AND (index_id=0 or index_id=1);