I've just been debugging a slow SQL query.
It's a join between 2 tables, with a WHERE clause conditioning on either a property of 1 table OR the other.
If I re-write it as a UNION then it's suddenly 2 orders of magnitude faster, even though those 2 queries produce identical outputs:
DECLARE @UserId UNIQUEIDENTIFIER = '0019813D-4379-400D-9423-56E1B98002CB'
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (@UserId) OR Bookings.MixedDealBroker in (@UserId))
--Execution time: ~4000ms
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (@UserId))
UNION
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (Bookings.MixedDealBroker in (@UserId))
--Execution time: ~70ms
This seems rather surprising to me! I would have expected the SQL compiler to be entirely capable of identifying that the 2nd form was equivalent and would have used that compilation approach if it were available.
Some context notes:
IN (@UserId)
vs = @UserId
makes no difference.JOIN
vs LEFT JOIN
.So:
UNIONs
(not viable for a variety of reasons.)=-=-=-=-=-=-=
This is a common limitation of SQL engines, not just in SQL Server, but also other database systems as well. The OR
complicates the predicate enough that the execution plan selected isn't always ideal. This probably relates to the fact that only one index can be seeked into per instance of a table object at a time (for the most part), or in your specific case, your OR
predicate is across two different tables, and other factors with how SQL engines are designed.
By using a UNION
clause, you now have two instances of the Bookings
table referenced, which can individually be seeked on separately in the most efficient way possible. That allows the SQL Engine to pick a better execution plan to serve you query.
This is pretty much just one of those things that are the way they are because that's just the way it is, and you need to remember the UNION
clause workaround for future encounters of this kind of performance issue.
Also, in response to your comment:
I don't understand how the difference can affect the EP, given that the 2 different "phrasings" of the query are identical?
A new execution plan is generated every time one doesn't exist in the plan cache for a given query, essentially. The way the Engine determines if a plan for a query is already cached is based on the exact hashing of that query statement, so even an extra space character at the end of the query can result in a new plan being generated. Theoretically that plan can be different. So a different written query (despite being logically the same) can surely result in a different execution plan.
There are other reasons a plan can change on re-generation too, such as different data and statistics of that data, in the tables referenced in the query between executions. But these reasons don't really apply to your question above.