Search code examples
sqlsql-serverquery-optimizationquery-planner

Where condition dramatically reduces performance unexpectedly


I am attempting to optimize a very large query that has a particularly bad join condition. I have reduced the larger query down to two tables to illustrate the negative behavior I am seeing.

CREATE TABLE #test1 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test1 ON #test1 (Id1, Id2)
-- Sample (1, "a|b|c|d")

CREATE TABLE #test2 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test2 ON #test2 (Id1, Id2)
-- Sample (1, "a|b")

The join I am attempting to optimize is a postfix match (query hints added to emulate the join order as these are a part of a much larger query).

DECLARE @id INT = 1

SELECT * 
FROM  #test1 t1
INNER [HASH] JOIN #test2 t2 ON t1.Id1 = t2.Id1
                            AND t1.Id2 LIKE t2.Id2 + '%'
-- This condition is the one that is causing issues
-- WHERE t1.Id1 = @id 
OPTION (FORCE ORDER)

Without the where clause, the join is able to use the HASH join hint:

[![no where clause](https://i.sstatic.net/fUpRA.png)](https://i.sstatic.net/fUpRA.png)

However with the where clause the optimizer is unable to build that query instead decides to do this:

With where clause

Substantially reducing the performance of the query.

As the where condition is reducing the data sets from both tables, I would have expected it to be strictly better, however it is completely changing the seek. It also prevents the join from being a hash or merge join, and it is not clear at all why that would be the case as it should just be reducing the search space for both tables.

The main question here is how can that where condition be having such a massive negative effect to the overall query plan when it shouldn't be changing the index access patterns


Solution

  • The reason why you see the error

    Msg 8622, Level 16, State 1, Line 12 Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

    When adding the

    WHERE t1.Id1 = @id
    

    Is because

    FROM  #test1 t1
    INNER JOIN #test2 t2 ON t1.Id1 = t2.Id1
        AND t1.Id2 LIKE t2.Id2 + '%'
    WHERE t1.Id1 = @id 
    

    is the same as

    FROM  (#test1 t1 WHERE t1.Id1 = @id) t1
    CROSS JOIN (#test2 t2 WHERE t2.Id1= @id) t2 
    WHERE t1.Id2 LIKE t2.Id2 + '%'
    

    So you lose the equi join that is necessary for a hash join as discussed in Implied Predicates and Query Hints (Craig Freedman).

    On the face of it the hash join doesn't really give you anything as the join condition is on Id1 and all of the rows on the build side have the same value for this and fall in the same hash bucket so it has to replay all of the rows in the build side for all of the rows in the probe side anyway.

    You could try using the NO_PERFORMANCE_SPOOL hint to get rid of the spool in the plan