so, I have made a php script that produces the following SQL query:
SELECT * FROM icecream WHERE
flavor = 'vanilla' AND color = 'purple' AND (quality = 'aaa')
OR flavor = 'marzipan' AND color = 'purple' AND (quality = 'aaa')
OR flavor = 'vanilla' AND color = 'purple' AND (quality = 'aaa')
The last and first bit of the query are identical.
That looks wrong to me, and obviously I'd much rather like to run a query like
SELECT * FROM icecream WHERE
flavor = 'vanilla' AND color = 'purple' AND (quality = 'aaa')
OR flavor = 'marzipan' AND color = 'purple' AND (quality = 'aaa')
omitting the last row.
Which looks fine in this example, but let's say that 1. There might be 23 "OR flavor..." parts instead 2. 100 different flavors 3. 50 different colors 4. 20 different quality ratings.
Suddenly it becomes a lot more complex to create a nice looking SQL query without duplicate requests for the same data.
The current SQL query works. But should I clean up the query to eliminate the duplication, or should I leave it to the SQL engine to do it for me?
I mean, I can do it...but is it worth it? On one hand I want to pass clean SQL queries, but on the other it seems like these kinds of things is what an SQL engine really is designed to do.
Any suggestions?
You can answer your question by reading the query plan created by the MySQL optimizer, using EXPLAIN [EXTENDED] SELECT ...
. If the plans are the same, then the server not only considers your queries logically equivalent, but that it has reduced them down to the same query internally.
https://dev.mysql.com/doc/refman/5.7/en/explain.html
Your preference should be for the generated queries to be logically optimal, not doing things that can reduce the options available to the optimizer (by doing undesirable things like using column names as function arguments in WHERE
, and be eyeball debuggable. There's no need for cleverness.
(a = 1 AND b = 1 AND c = 5) OR
(a = 1 AND b = 1 AND c = 27)
...is entirely equivalent to...
(a = 1 AND b = 1) AND (c = 5 OR c = 27)
....or...
(a = 1 AND b = 1 AND c IN (5,27))
...and the optimizer will easily understand these conditions regardless of how they are expressed in the query. The resulting plan should be identical, so there's no need to prefer one over the others when generating dynamic queries. (Older versions of MySQL might not necessarily have handled the last one as well as they might have handled the first two but this should not be an issue, now.)
Importantly, WHERE
of course, being a logical expression, has deterministic evaluation precedence from a logical perspective, but that is not the same as ordering of evaluation against each row... there is, theoretically, no order in which the evaluation of the conditions of the final expression are ordered -- but the optimizer is free to actually evaluate the conditions in any order that seems right and optimal, regardless of how they are expressed.
Based on index lookups, here, it might choose to find all b = 1
rows, then find c = 5 OR c = 27
within that set, and finally scan the resulting rows for the a = 1
condition. Such might be the case if you had a single index on columns (b,c)
. There is no shortcutting in the procedural sense -- WHERE a = 1 AND b = 1
is logically identical to WHERE b = 1 AND a = 1
.
Note that you should use more parentheses in the example queries so that the grouping of the AND/OR precedence is unambiguous. The server will get it right, of course, but the eyeballs are more easily deceived, and parentheses that make things unambiguous to the eyeballs will do no disservice to the MySQL optimizer, which seems to love them.