I have a large query in mysql that involves joining multiple tables together. It's too slow, so I've done "explain" and see that it's creating a temporary table, which I suspect of taking most of the execution time. I found some related information:
My query does not appear to meet any of the conditions listed in the docs #1, in the order that the joins were written by me. By experimentation, however, I find that if I remove my order by
clause, the temporary table is not created. That makes me look at this rule from the doc:
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
This is the same rule that played in the example at #2 above, but in #2, the OP explicitly had columns from multiple tables in the order by
clause, so that's at least superficially different.
Moreover, when I look at the output from explain
, it appears that the table that I listed first is not used first by the optimizer. Putting down a pseudo-query for example:
select * from A
join B on A.c1=B.c1
join C on A.c2=C.c2
where A.c3='value'
order by A.c4
I would say that my order by
clause does use only columns from the "first table in the join queue" based on the order that I've written the query. On the other hand, the output from explain
suggests that it first considers table B then A.
So here are the questions:
It refers to the order in which the optimiser evaluates them (join queue). The optimiser may not even be aware of the order of the tables in your sql statement.
No, it does not contradict what's been written in #3, since the answer explicitly writes (emphasis is mine):
has no effect on the result
The result and performance are two different things. Actually, there is an upvoted comment to the answer saying that
But it might affect the query plan (=> performance)
You can tell the optimiser which table to process first by using straight_join:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
However, you need to be careful with that because you tie the optimiser's hand. See this SO topic on discussing advantages and disadvantages of straight_join.
Number of records, where criteria, indexes - they all play their part in the optimiser's decision of the processing order of tables. There is no magic bullet, you need to play around a bit and probably you can trick the optimiser to change the order of the tables.