Search code examples
mysqlsqlquery-performance

When UNION stops being superior than OR conditions in SQL queries?


In most cases, when I tried to remove OR condition and replace them with a UNION (which holds each of the conditions separately), it performed significantly better, as those parts of the query were index-able again.

Is there a rule of thumb (and maybe some documentation to support it) on when this 'trick' stops being useful? Will it be useful for 2 OR conditions? for 10 OR conditions? As the amount of UNIONs increases, and the UNION distinct part may have its own overhead.

What would be your rule of thumb on this?

Small example of the transformation:

SELECT 
    a, b
FROM
    tbl
WHERE
    a = 1 OR b = 2

Transformed to:

(SELECT 
    tbl.a, tbl.b
FROM
    tbl
WHERE
    tbl.b = 2) 
UNION DISTINCT 
(SELECT 
    tbl.a, tbl.b
FROM
    tbl
WHERE
    tbl.a = 1)

Solution

  • I suggest there is no useful Rule of Thumb (RoT). Here is why...

    • As you imply, more UNIONs implies slower work, while more ORs does not (at least not much). The SELECTs of a union are costly because they are separate. I would estimate that a UNION of N SELECTs takes about N+1 or N+2 units of time, where one indexed SELECT takes 1 unit of time. In contrast, multiple ORs does little to slow down the query, since fetching all rows of the table is the costly part.

    • How fast each SELECT of a UNION runs depends on how good the index is and how few rows are fetched. This can vary significantly. (Hence, it makes it hard to devise a RoT.)

    • A UNION starts by generating a temp table into which each SELECT adds the rows it finds. This is some overhead. In newer versions (5.7.3 / MariaDB 10.1), there are limited situations where the temp table can be avoided. (This eliminates the hypothetical +1 or +2, thereby adding more complexity into devising a RoT.)

    • If it is UNION DISTINCT (the default) instead of UNION ALL, there needs to be a dedup-pass, probably involving a sort over the temp table. Note: This means that the even the new versions cannot avoid the temp table. UNION DISTINCT precisely mimics the OR, yet you may know that ALL would give the same answer.