sql-server sql-execution-plan non-clustered-index

Adding non-clustered index to a table to increase performance

I have table structure as below

CREATE TABLE [dbo].[AIRQUALITYTS2]
(
    [FeatureID] [nvarchar](20) NOT NULL,
    [ParameterID] [nvarchar](20) NOT NULL,
    [MeasurementDateTime] [datetime2](7) NOT NULL,
    [ParameterValue] [numeric](38, 8) NULL,
    [Remarks] [nvarchar](150) NULL,

    CONSTRAINT [PK_AIRQUALITYTS2] 
        PRIMARY KEY CLUSTERED ([FeatureID] ASC, [ParameterID] ASC, [MeasurementDateTime] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

When I execute this query:

set statistics io on

SELECT 
    COUNT(featureid), featureid 
FROM
    AIRQUALITYTS2 
WHERE
    FeatureID LIKE 'AS%' 
    AND ParameterID = 'AP2' 
    AND YEAR(MeasurementDateTime) = 2015
GROUP BY 
    FeatureID
ORDER BY 
    FeatureID

I see the logical records 101871 and query execution plan is

But when I add non-clustered index on this table as

 CREATE NONCLUSTERED INDEX non_fidpidmdate
     ON [dbo].[AIRQUALITYTS2] ([ParameterID], [FeatureID])
     INCLUDE ([MeasurementDateTime])

When I execute same query I see logical records reads 4636 only and is very fast and query execution plan is

Question 1: when there is less logical records in second query.

Question 2: Why first query is using clustered index scan as displayed in first image though it has clustered index on featureid,ParameterID and MeasurementDateTime while after adding non-cluster index it uses Index Seek (Non-Clustered) second image as displayed in images

Note: I have change where clause to

MeasurementDateTime >= '2004-01-01 00:00:00' 
and MeasurementDateTime <= '2004-12-31 00:00:00'

to make it sargable but still the results are the same.

Solution

In your original CREATE TABLE where you create the PRIMARY KEY CLUSTERED, it specifies the columns to cluster on, in the order they are clustered (stored) in.

[FeatureID]
[ParameterID]
[MeasurementDateTime]

If you run a query with a WHERE clause that includes a specific FeatureID then it would be able to seek to that part of the index.

But you haven't done that in the query. You've used WHERE FeatureID LIKE 'AS%' ...

The query engine cannot seek, because that LIKE with a trailing wildcard % means it has to scan across all the FeatureIDs that start with the letters AS and then within each of those nodes in the tree see if there are records that match ParameterID = 'AP2' AND YEAR(MeasurementDateTime) = 2015.

In your Non-Clustered index, you've used a different column order:

[ParameterID]
[FeatureID]

When you run the same query, it can seek because you've specified an exact ParameterID in the WHERE clause.

Ordering is important! SQL Indexes are sortof B-Tree data structures, and you can't physically store them (or traverse them) in different orderings without creating multiple indexes. Creating too many indexes can be too much overhead for the database, so yes create ones that help the majority of your queries, but don't create too many. Mostly this involves knowing what sort of queries are frequently run against your database and tuning accordingly.