I have been scratching my head on this one.
I run as simple select count(id)
on a table with id
as clustered integer primary key and the SQL Optimizer totally ignores the primary key in it's query execution plan, in favor of an index on a date field.... ???
Actual table:
CREATE TABLE [dbo].[msgr](
[id] [int] IDENTITY(1,1) NOT NULL,
[dt] [datetime2](3) NOT NULL CONSTRAINT [DF_msgr_dt] DEFAULT (sysdatetime()),
[uid] [int] NOT NULL,
[msg] [varchar](7000) NOT NULL CONSTRAINT [DF_msgr_msg] DEFAULT (''),
[type] [tinyint] NOT NULL,
[cid] [int] NOT NULL CONSTRAINT [DF_msgr_cid] DEFAULT ((0)),
[via] [tinyint] NOT NULL,
[msg_id] [bigint] NOT NULL,
CONSTRAINT [PK_msgr] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Please what could be the reason for this?
1) In my opinion, the key point here is that for clustered tables (tables which have a clustered index = the main data structure = is that data structure that store table data = clustered index is the table itself) the every non-clustered index include also the the key of clustered index. This means that
CREATE [UNIQUE] NONCLUSTERED INDEX bla
ON [dbo].[msgr] (uid)
is basically the same thing as
CREATE [UNIQUE] NONCLUSTERED INDEX bla
ON [dbo].[msgr] (uid)
INCLUDE (id) -- id = key of clustered index
So, for such tables, every record from non-clustered indexes on the leaf pages includes also the key of clustered index. This way, within every non-clustered index and for every leaf record SQL Server store also some kind of pointer to the main data structure.
2) This means that SELECT COUNT(id) FROM dbo.msgr
can be executed using CI but also using NCI because both indexes include the id
(key of clustered index) column.
As a secondary note within this topic, because IDENTITY
property (for id
column) means a mandatory column (NOT NULL
), COUNT(id)
is the same thing as COUNT(*)
. Also, this means that COUNT(msg_id)
(also a mandatory / NOT NULL
) column is the same thing as COUNT(*)
. So, it's very likely that execution plan for SELECT COUNT(msg_id) FROM dbo.msgr
will use the same NCI (for example bla
).
3) Non-clustered indexes have smaller size than clustered index. This means also less IO => It's better from performance point of view to use the NCI than CI.
I would do following simple test:
SET STATISTICS IO ON;
GO
SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[bla]) -- It forces usage of NCI
GO
SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[PK_msgr]) -- It forces usage of CI
GO
SET STATISTICS IO OFF;
GO
If there is a lot of data within msgr
table then STATISTICS IO
will show different LIO
(logical IO), with less LIO for NCI query.