I have a query with an OR
that runs poorly (i.e. a cost of 138 units):
SELECT *
FROM Customers
WHERE (FirstName LIKE 'Ian%') OR (LastName LIKE 'Boyd%')
But this query that returns the same runs much better (i.e. 0.6 units):
SELECT *
FROM Customers
WHERE (FirstName LIKE 'Ian%')
UNION
SELECT *
FROM Customers
WHERE (LastName LIKE 'Boyd%')
These two queries are logically equivalent or identical--from the standpoint of what information I'm asking the relational database engine for. But if that's the case, the modern sophisticated query optimizer should understand all this, and not run the query any differently. And yet it does; which we know we can all chalk up to quirky query optimizers. So maybe they're not equivalent.
Are these two database queries logically equivalent?
CREATE TABLE Customers
(
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL
)
CREATE INDEX IX_Customers_FirstName ON Customers (FirstName);
CREATE INDEX IX_Customers_LastNmae ON Customers (LastName);
The two are logically identical assuming that customers
has no duplicate rows. This is a reasonable assumption.
The UNION
version is faster probably because the SQL engine can use indexes for the LIKE
patterns -- they don't begin with a wildcard.
A faster and almost equivalent version is:
SELECT *
FROM Customers
WHERE FirstName LIKE 'Ian%'
UNION ALL
SELECT *
FROM Customers
WHERE LastName LIKE 'Boyd%' AND FirstName NOT LIKE 'Ian%';
The only issue here is if FirstName
is NULL
. In that case, the logic would filter out even matching last names. An exact equivalent needs to take this into account:
SELECT *
FROM Customers
WHERE FirstName LIKE 'Ian%'
UNION ALL
SELECT *
FROM Customers
WHERE LastName LIKE 'Boyd%' AND
(FirstName NOT LIKE 'Ian%' OR FirstName IS NULL);
These versions should be faster because they use UNION ALL
and not UNION
. The latter incurs overhead for removing duplicates. However, the WHERE
clause removes those duplicates without having to look across rows.