Search code examples
mysqlsqlperformanceunion

SQL Performance UNION vs OR


I just read part of an optimization article and segfaulted on the following statement:

When using SQL replace statements using OR with a UNION:

select username from users where company = ‘bbc’ or company = ‘itv’;

to:

select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;

From a quick EXPLAIN:

Using OR:

enter image description here

Using UNION:

enter image description here

Doesn't this mean UNION does in double the work?

While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.

Question

Am I wrong?


Solution

  • Either the article you read used a bad example, or you misinterpreted their point.

    select username from users where company = 'bbc' or company = 'itv';
    

    This is equivalent to:

    select username from users where company IN ('bbc', 'itv');
    

    MySQL can use an index on company for this query just fine. There's no need to do any UNION.

    The more tricky case is where you have an OR condition that involves two different columns.

    select username from users where company = 'bbc' or city = 'London';
    

    Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

    The UNION solution is for this type of case.

    select username from users where company = 'bbc' 
    union
    select username from users where city = 'London';
    

    Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.

    An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

    UNION does have to eliminate duplicates, and to do that it sorts the result set, so there is some overhead. But it usually only has to sort a small result set.

    There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting of a small set of rows is much less costly than the table-scan of a large set of rows.

    The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.