Search code examples
sql-serversql-server-2016

Prevent SQL from over-optimising constraint


My question: is there a way to tell SQL Server: optimise(query part A), optimise(query part B) and ONLY THEN return the intersection of the two? Currently it is optimising(A+B), and the result of that is 30x slower, when compared to executing the two parts of the query.

I could look at the execution plan of the combined query and add indexes/try to restructure it etc., but that is not my question here. Is there an alternative to that - maybe there's some query engine hints I could add? or another way I could restructure the query so the engine doesn't try to optimise it as a whole?

I have two queries that separately run in a few seconds and return only a few rows. When I combine them into a single query, it executes for 5+ minutes.

SELECT trk.* 
FROM dbo.[vwNewRMA] trk 
WHERE trk.[Repair Centre] = 'US' 
  AND trk.IsWorkOrder IS NULL

Part A, runs in 5 seconds and returns 240 rows.

SELECT Customer 
FROM [RMACustomer] 
WHERE [USServiceCenterCustomer] = 1

Part B, runs in 1 second and returns 32 rows

However, when I try to filter the first record set by the second, my best attempt takes more than 5 minutes:

SELECT t1.* 
FROM 
    (SELECT trk.* 
     FROM dbo.[vwNewRMA] trk 
     WHERE trk.[Repair Centre] = 'US' 
       AND trk.IsWorkOrder IS NULL) t1
WHERE 
    t1.Customer IN (SELECT Customer 
                    FROM [RMACustomer] 
                    WHERE [USServiceCenterCustomer] = 1)

More than 16 minutes:

SELECT trk.* 
FROM dbo.[vwNewRMA] trk 
WHERE trk.[Repair Centre] = 'US' 
  AND trk.IsWorkOrder IS NULL 
  AND trk.Customer IN (SELECT Customer 
                       FROM [RMACustomer] 
                       WHERE [USServiceCenterCustomer] = 1)

More than 16 minutes:

SELECT trk.* 
FROM dbo.[vwNewRMA] trk 
INNER JOIN [RMACustomer] cust ON cust.Customer = trk.Customer 
                              AND cust.[USServiceCenterCustomer] = 1
WHERE trk.[Repair Centre] = 'US' 
  AND trk.IsWorkOrder IS NULL

Is there a trick I could use to combine them into a single statement that forces the query engine to not overthink it?


Solution

  • You could also try using TOP or OFFSET to encourage some separation of the two subtrees.

    You might also get better results with ORDER BY Customer than ORDER BY (SELECT 0) as then potentially they can be merged joined together and this might make it less likely that it chooses unspooled nested loops and repeatedly re-evaluates one part.

    WITH PartA
         AS (SELECT trk.*
             FROM   dbo.[vwNewRMA] trk
             WHERE  trk.[Repair Centre] = 'US'
                    AND trk.IsWorkOrder IS NULL
             ORDER  BY (SELECT 0)
             OFFSET 0 ROWS),
         PartB
         AS (SELECT Customer
             FROM   [RMACustomer]
             WHERE  [USServiceCenterCustomer] = 1
             ORDER  BY (SELECT 0)
             OFFSET 0 ROWS)
    SELECT *
    FROM   PartA a
    WHERE  a.Customer IN (SELECT b.Customer
                          FROM   PartB b)
    

    e.g. compare the execution plans here

    enter image description here

    But unless you are using this in some context that disallows temp tables I'd just do that as it is more reliable and also assists with more accurate cardinality estimation.