Search code examples
sqlsql-serverperformancequery-optimizationdatabase-performance

How can this change be making my query slow (OR vs UNION) and can I fix it?


I've just been debugging a slow SQL query.

It's a join between 2 tables, with a WHERE clause conditioning on either a property of 1 table OR the other.

If I re-write it as a UNION then it's suddenly 2 orders of magnitude faster, even though those 2 queries produce identical outputs:

DECLARE @UserId UNIQUEIDENTIFIER = '0019813D-4379-400D-9423-56E1B98002CB'

SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (BookingPricings.[Owner] in (@UserId) OR Bookings.MixedDealBroker in (@UserId))

--Execution time: ~4000ms



SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (BookingPricings.[Owner] in (@UserId))
UNION
SELECT *
    FROM Bookings
    LEFT JOIN BookingPricings ON Booking = Bookings.ID
    WHERE (Bookings.MixedDealBroker in (@UserId))

--Execution time: ~70ms

This seems rather surprising to me! I would have expected the SQL compiler to be entirely capable of identifying that the 2nd form was equivalent and would have used that compilation approach if it were available.

Some context notes:

  • I've checked and IN (@UserId) vs = @UserId makes no difference.
  • Nor does JOIN vs LEFT JOIN.
  • Those tables each have 100,000s records, and the filter cuts it down to ~100.
  • In the slow version it seems to be reading every row of both tables.

So:

  • Does anyone have any ideas for how this comes about.
  • What (if anything) can I do to fix the performance without just re-writing the query as a series of UNIONs (not viable for a variety of reasons.)

=-=-=-=-=-=-=

Execution Plans: enter image description here


Solution

  • This is a common limitation of SQL engines, not just in SQL Server, but also other database systems as well. The OR complicates the predicate enough that the execution plan selected isn't always ideal. This probably relates to the fact that only one index can be seeked into per instance of a table object at a time (for the most part), or in your specific case, your OR predicate is across two different tables, and other factors with how SQL engines are designed.

    By using a UNION clause, you now have two instances of the Bookings table referenced, which can individually be seeked on separately in the most efficient way possible. That allows the SQL Engine to pick a better execution plan to serve you query.

    This is pretty much just one of those things that are the way they are because that's just the way it is, and you need to remember the UNION clause workaround for future encounters of this kind of performance issue.


    Also, in response to your comment:

    I don't understand how the difference can affect the EP, given that the 2 different "phrasings" of the query are identical?

    A new execution plan is generated every time one doesn't exist in the plan cache for a given query, essentially. The way the Engine determines if a plan for a query is already cached is based on the exact hashing of that query statement, so even an extra space character at the end of the query can result in a new plan being generated. Theoretically that plan can be different. So a different written query (despite being logically the same) can surely result in a different execution plan.

    There are other reasons a plan can change on re-generation too, such as different data and statistics of that data, in the tables referenced in the query between executions. But these reasons don't really apply to your question above.