Search code examples
sql-serverindexingfiltered-index

Filtered Index in SQL Server missing predicate does not work as expected


I am currently experimenting with filtered indexes in SQL Server. I was trying to shrink a filtered index down by putting the following hint from BOL into practice:

A column in the filtered index expression does not need to be a key or included column in the filtered index definition if the filtered index expression is equivalent to the query predicate and the query does not return the column in the filtered index expression with the query results.

I have reproduced the problem in a small test script: My table looks as follows:

CREATE TABLE #test
(
    ID  BIGINT NOT NULL IDENTITY(1,1),
    ARCHIVEDATE DATETIME NULL,
    CLOSINGDATE DATETIME NULL,
    OBJECTTYPE INTEGER NOT NULL,
    ACTIVE BIT NOT NULL,
    FILLER1 CHAR(255) DEFAULT 'just a filler',
    FILLER2 CHAR(255) DEFAULT 'just a filler',
    FILLER3 CHAR(255) DEFAULT 'just a filler',
    FILLER4 CHAR(255) DEFAULT 'just a filler',
    FILLER5 CHAR(255) DEFAULT 'just a filler',
    CONSTRAINT test_pk PRIMARY KEY CLUSTERED (ID ASC)
);

I need to optimize the following query:

SELECT  
    COUNT(*) 
FROM    
    #test 
WHERE       
        ARCHIVEDATE IS NULL 
    AND CLOSINGDATE IS NOT NULL 
    AND ISNULL(ACTIVE,1) != 0

Therefore I have built the following filtered index:

CREATE NONCLUSTERED INDEX idx_filterTest ON #test (/*ARCHIVEDATE ASC,*/CLOSINGDATE ASC) INCLUDE (ACTIVE) WHERE ARCHIVEDATE IS NULL;

ARCHIVEDATE is already in the filter and will not be used in SELECT thus it's not contained in the index keys or includes.

However, if I run the query I get the following plan: plan for query filters for operators

There's a key lookup in the clustered index for ARCHIVEDATE. Why is that so? I have reproduced this behaviour on SQL Server 2008 and SQL Server 2016.

If I create the index with ARCHIVEDATE in the key I get away with just an index seek. So it seems to me as this paragraph in BOL doesn't always apply.

Here's my complete repro script:

--DROP TABLE #test;
CREATE TABLE #test
(
    ID  BIGINT NOT NULL IDENTITY(1,1),
    ARCHIVEDATE DATETIME NULL,
    CLOSINGDATE DATETIME NULL,
    OBJECTTYPE INTEGER NOT NULL,
    ACTIVE BIT NOT NULL,
    FILLER1 CHAR(255) DEFAULT 'just a filler',
    FILLER2 CHAR(255) DEFAULT 'just a filler',
    FILLER3 CHAR(255) DEFAULT 'just a filler',
    FILLER4 CHAR(255) DEFAULT 'just a filler',
    FILLER5 CHAR(255) DEFAULT 'just a filler',
    CONSTRAINT test_pk PRIMARY KEY CLUSTERED (ID ASC)
);



INSERT INTO #test
(ARCHIVEDATE, CLOSINGDATE, OBJECTTYPE, ACTIVE)
SELECT TOP 200
    NULL,
    dates.calcDate,
    4711,
    dates.number%2
FROM
    (
        SELECT
            /* Erzeugen des Datums durch Addieren der jeweiligen Sequenznummer zum StartDate */
            DATEADD(DAY, seq.number, '20120101') AS calcDate, number
        FROM
        (
            /* Abfrage zur Erstellung einer Nummernsequenz von 0 bis 9999. Dient als Basis zur Aufbereitung aller Datumswerte im Zeitraum. Die Sequenz reicht für einen Zeitraum von ca. 30 Jahren aus. */
            SELECT
                a.num * 1000 + b.num * 100 + c.num * 10 + d.num AS number
            FROM
                        ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
        ) seq 
        WHERE
            /* Einschränkung der Nummernsequenz auf die Anzahl der Tage im gewünschten Aufbereitungszeitraum */
            seq.number <= 5000
    ) dates
ORDER BY
    dates.number
;



INSERT INTO #test
(ARCHIVEDATE, CLOSINGDATE, OBJECTTYPE, ACTIVE)
SELECT TOP 1000
    dates.calcDate + 3,
    dates.calcDate,
    4711,
    dates.number%2
FROM
    (
        SELECT
            /* Erzeugen des Datums durch Addieren der jeweiligen Sequenznummer zum StartDate */
            DATEADD(DAY, seq.number, '20120101') AS calcDate, number
        FROM
        (
            /* Abfrage zur Erstellung einer Nummernsequenz von 0 bis 9999. Dient als Basis zur Aufbereitung aller Datumswerte im Zeitraum. Die Sequenz reicht für einen Zeitraum von ca. 30 Jahren aus. */
            SELECT
                a.num * 1000 + b.num * 100 + c.num * 10 + d.num AS number
            FROM
                        ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
        ) seq 
        WHERE
            /* Einschränkung der Nummernsequenz auf die Anzahl der Tage im gewünschten Aufbereitungszeitraum */
            seq.number <= 5000
    ) dates
ORDER BY
    dates.number
;


INSERT INTO #test
(ARCHIVEDATE, CLOSINGDATE, OBJECTTYPE, ACTIVE)
SELECT TOP 100000
    dates.calcDate,
    NULL,
    4711,
    dates.number%2
FROM
    (
        SELECT
            /* Erzeugen des Datums durch Addieren der jeweiligen Sequenznummer zum StartDate */
            DATEADD(DAY, seq.number, '20120101') AS calcDate, number
        FROM
        (
            /* Abfrage zur Erstellung einer Nummernsequenz von 0 bis 9999. Dient als Basis zur Aufbereitung aller Datumswerte im Zeitraum. Die Sequenz reicht für einen Zeitraum von ca. 30 Jahren aus. */
            SELECT
                a.num * 1000 + b.num * 100 + c.num * 10 + d.num AS number
            FROM
                        ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
            CROSS JOIN  ( SELECT 0 AS num UNION ALL SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
        ) seq 
        WHERE
            /* Einschränkung der Nummernsequenz auf die Anzahl der Tage im gewünschten Aufbereitungszeitraum */
            seq.number <= 5000
    ) dates
ORDER BY
    dates.number
;


--DROP INDEX idx_filterTest ON #test;
--CREATE NONCLUSTERED INDEX idx_filterTest ON #test (ARCHIVEDATE ASC,CLOSINGDATE ASC) INCLUDE (ACTIVE) WHERE ARCHIVEDATE IS NULL;
CREATE NONCLUSTERED INDEX idx_filterTest ON #test (/*ARCHIVEDATE ASC,*/CLOSINGDATE ASC) INCLUDE (ACTIVE) WHERE ARCHIVEDATE IS NULL;



SELECT  
    COUNT(*) 
FROM    
    #test 
WHERE       
        ARCHIVEDATE IS NULL 
    AND CLOSINGDATE IS NOT NULL 
    AND ISNULL(ACTIVE,1) != 0;

Solution

  • This is a bug in the optimizer, specifically in the way it handles IS NULL filters. Here's a simpler repro:

    CREATE TABLE #T(ID INT IDENTITY PRIMARY KEY, X INT);
    INSERT #T(X) SELECT TOP(10000) message_id FROM sys.messages WHERE message_id <> 1;
    INSERT #T(X) VALUES (1);
    INSERT #T(X) VALUES (NULL);
    CREATE INDEX IX_#T_X_null ON #T(ID) WHERE X IS NULL;
    CREATE INDEX IX_#T_X_1 ON #T(ID) WHERE X = 1;
    

    Clearly the following query is covered by IX_#T_X_null:

    SELECT MIN(ID) FROM #T WHERE X IS NULL;
    

    And the optimizer indeed picks it, but we get an execution plan where a superfluous clustered index lookup is inserted. But:

    SELECT MIN(ID) FROM #T WHERE X = 1;
    

    Now we get a query without a clustered index lookup. When IS NULL is involved, the optimizer seems to recognize that the filtered index applies, but is then unable to propagate the condition to a later step. We can clearly see this if we include the column with the index:

    CREATE INDEX IX_#T_X_null ON #T(ID, X) WHERE X IS NULL;
    

    If you now compare the execution plans of the WHERE X = 1 and WHERE X IS NULL queries, you'll see that in the case of X IS NULL, the optimizer adds a predicate into the index scan, which it doesn't do with X = 1.

    And delving into a bit a bit further, with this specific setup, you can find this is a known issue, already reported on Connect. According to Microsoft, however, "this is actually not a bug but rather a known gap in functionality" (which I suppose is technically true, since the results are not incorrect, it just doesn't perform as well as it could). Also, "this is now an active DCR for a future release of SQL Server", but that was 6 years ago, and the ticket is closed as "Won't Fix" -- so don't hold your breath.

    Unfortunately, the workaround is indeed to include the column in the index -- I'd make it an included column and not a key, since that adds overhead to the non-leaf levels:

    CREATE NONCLUSTERED INDEX idx_filterTest ON #test (CLOSINGDATE ASC)
    INCLUDE (ACTIVE, ARCHIVEDATE) 
    WHERE ARCHIVEDATE IS NULL;
    

    I say "unfortunately" because this always-NULL column will still pointlessly take up row space (since DATETIME is a fixed size data type). Even so, it's probably miles better than getting extra I/O from clustered index lookups. Also, the overhead can be reduced to almost nothing by compressing the index (even row compression will do).