Search code examples
sql-serversql-execution-plan

Why does SQL Server query optimizer sometimes overlook obvious clustered primary key?


I have been scratching my head on this one.

I run as simple select count(id) on a table with id as clustered integer primary key and the SQL Optimizer totally ignores the primary key in it's query execution plan, in favor of an index on a date field.... ???

Actual table:

CREATE TABLE [dbo].[msgr](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [dt] [datetime2](3) NOT NULL CONSTRAINT [DF_msgr_dt]  DEFAULT (sysdatetime()),
    [uid] [int] NOT NULL,
    [msg] [varchar](7000) NOT NULL CONSTRAINT [DF_msgr_msg]  DEFAULT (''),
    [type] [tinyint] NOT NULL,
    [cid] [int] NOT NULL CONSTRAINT [DF_msgr_cid]  DEFAULT ((0)),
    [via] [tinyint] NOT NULL,
    [msg_id] [bigint] NOT NULL,
 CONSTRAINT [PK_msgr] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Please what could be the reason for this?


Solution

  • 1) In my opinion, the key point here is that for clustered tables (tables which have a clustered index = the main data structure = is that data structure that store table data = clustered index is the table itself) the every non-clustered index include also the the key of clustered index. This means that

    CREATE [UNIQUE] NONCLUSTERED INDEX bla 
    ON [dbo].[msgr] (uid) 
    

    is basically the same thing as

    CREATE [UNIQUE] NONCLUSTERED INDEX bla 
    ON [dbo].[msgr] (uid) 
    INCLUDE (id) -- id = key of clustered index
    

    So, for such tables, every record from non-clustered indexes on the leaf pages includes also the key of clustered index. This way, within every non-clustered index and for every leaf record SQL Server store also some kind of pointer to the main data structure.

    2) This means that SELECT COUNT(id) FROM dbo.msgr can be executed using CI but also using NCI because both indexes include the id (key of clustered index) column.

    As a secondary note within this topic, because IDENTITY property (for id column) means a mandatory column (NOT NULL), COUNT(id) is the same thing as COUNT(*). Also, this means that COUNT(msg_id) (also a mandatory / NOT NULL) column is the same thing as COUNT(*). So, it's very likely that execution plan for SELECT COUNT(msg_id) FROM dbo.msgr will use the same NCI (for example bla).

    3) Non-clustered indexes have smaller size than clustered index. This means also less IO => It's better from performance point of view to use the NCI than CI.

    I would do following simple test:

    SET STATISTICS IO ON;
    GO
    
    SELECT COUNT(id)
    FROM dbo.[dbo].[msgr] WITH(INDEX=[bla]) -- It forces usage of NCI
    GO
    
    SELECT COUNT(id)
    FROM dbo.[dbo].[msgr] WITH(INDEX=[PK_msgr]) -- It forces usage of CI
    GO
    
    SET STATISTICS IO OFF;
    GO
    

    If there is a lot of data within msgr table then STATISTICS IO will show different LIO (logical IO), with less LIO for NCI query.