Search code examples
sqlrelational-database

Are these two database queries logically equivalent?


I have a query with an OR that runs poorly (i.e. a cost of 138 units):

SELECT *
FROM Customers
WHERE (FirstName LIKE 'Ian%') OR (LastName LIKE 'Boyd%')

But this query that returns the same runs much better (i.e. 0.6 units):

SELECT *
FROM Customers
WHERE (FirstName LIKE 'Ian%')

UNION

SELECT *
FROM Customers
WHERE (LastName LIKE 'Boyd%')

These two queries are logically equivalent or identical--from the standpoint of what information I'm asking the relational database engine for. But if that's the case, the modern sophisticated query optimizer should understand all this, and not run the query any differently. And yet it does; which we know we can all chalk up to quirky query optimizers. So maybe they're not equivalent.

Are these two database queries logically equivalent?

CREATE TABLE Customers 
( 
    FirstName varchar(50) NOT NULL,
    LastName varchar(50) NOT NULL
)

CREATE INDEX IX_Customers_FirstName ON Customers (FirstName);
CREATE INDEX IX_Customers_LastNmae  ON Customers (LastName);

Solution

  • The two are logically identical assuming that customers has no duplicate rows. This is a reasonable assumption.

    The UNION version is faster probably because the SQL engine can use indexes for the LIKE patterns -- they don't begin with a wildcard.

    A faster and almost equivalent version is:

    SELECT *
    FROM Customers
    WHERE FirstName LIKE 'Ian%'
    UNION ALL
    SELECT *
    FROM Customers
    WHERE LastName LIKE 'Boyd%' AND FirstName NOT LIKE 'Ian%';
    

    The only issue here is if FirstName is NULL. In that case, the logic would filter out even matching last names. An exact equivalent needs to take this into account:

    SELECT *
    FROM Customers
    WHERE FirstName LIKE 'Ian%'
    UNION ALL
    SELECT *
    FROM Customers
    WHERE LastName LIKE 'Boyd%' AND
          (FirstName NOT LIKE 'Ian%' OR FirstName IS NULL);
    

    These versions should be faster because they use UNION ALL and not UNION. The latter incurs overhead for removing duplicates. However, the WHERE clause removes those duplicates without having to look across rows.