In most cases, when I tried to remove OR condition and replace them with a UNION (which holds each of the conditions separately), it performed significantly better, as those parts of the query were index-able again.
Is there a rule of thumb (and maybe some documentation to support it) on when this 'trick' stops being useful? Will it be useful for 2 OR conditions? for 10 OR conditions? As the amount of UNIONs increases, and the UNION distinct part may have its own overhead.
What would be your rule of thumb on this?
Small example of the transformation:
SELECT
a, b
FROM
tbl
WHERE
a = 1 OR b = 2
Transformed to:
(SELECT
tbl.a, tbl.b
FROM
tbl
WHERE
tbl.b = 2)
UNION DISTINCT
(SELECT
tbl.a, tbl.b
FROM
tbl
WHERE
tbl.a = 1)
I suggest there is no useful Rule of Thumb (RoT). Here is why...
As you imply, more UNIONs
implies slower work, while more ORs
does not (at least not much). The SELECTs
of a union are costly because they are separate. I would estimate that a UNION
of N SELECTs
takes about N+1 or N+2 units of time, where one indexed SELECT
takes 1 unit of time. In contrast, multiple ORs
does little to slow down the query, since fetching all rows of the table is the costly part.
How fast each SELECT
of a UNION
runs depends on how good the index is and how few rows are fetched. This can vary significantly. (Hence, it makes it hard to devise a RoT.)
A UNION
starts by generating a temp table into which each SELECT
adds the rows it finds. This is some overhead. In newer versions (5.7.3 / MariaDB 10.1), there are limited situations where the temp table can be avoided. (This eliminates the hypothetical +1 or +2, thereby adding more complexity into devising a RoT.)
If it is UNION DISTINCT
(the default) instead of UNION ALL
, there needs to be a dedup-pass, probably involving a sort over the temp table. Note: This means that the even the new versions cannot avoid the temp table. UNION DISTINCT
precisely mimics the OR
, yet you may know that ALL
would give the same answer.