Search code examples
mysqljoinoptimizationexplain

Clarification of join order for creation of temporary tables


I have a large query in mysql that involves joining multiple tables together. It's too slow, so I've done "explain" and see that it's creating a temporary table, which I suspect of taking most of the execution time. I found some related information:

  1. The mysql docs describe conditions when a temporary table might be created. ("The server creates temporary tables under conditions such as these..." [Emphasis added])
  2. This related SO question Using index, using temporary, using filesort - how to fix this?, which provides a link to the doc and applies it in a specific case.
  3. This related SO question Order of join conditions important? that talks about the order in which joins are evaluated.

My query does not appear to meet any of the conditions listed in the docs #1, in the order that the joins were written by me. By experimentation, however, I find that if I remove my order by clause, the temporary table is not created. That makes me look at this rule from the doc:

Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.

This is the same rule that played in the example at #2 above, but in #2, the OP explicitly had columns from multiple tables in the order by clause, so that's at least superficially different.

Moreover, when I look at the output from explain, it appears that the table that I listed first is not used first by the optimizer. Putting down a pseudo-query for example:

select * from A 
join B on A.c1=B.c1
join C on A.c2=C.c2
where A.c3='value'
order by A.c4

I would say that my order by clause does use only columns from the "first table in the join queue" based on the order that I've written the query. On the other hand, the output from explain suggests that it first considers table B then A.

So here are the questions:

  1. Does the quoted rule above for use of temporary tables refer to the order that I write the tables or the order that the software chooses to evaluate them?
  2. If it's the order that I write them, does this mean that the order of the joins does impact performance? (Seems to contradict the claims at #3 above.)
  3. If it's the order that the software chooses to evaluate them, is there any way to coerce or trick it into selecting and order that doesn't use the table?

Solution

    1. It refers to the order in which the optimiser evaluates them (join queue). The optimiser may not even be aware of the order of the tables in your sql statement.

    2. No, it does not contradict what's been written in #3, since the answer explicitly writes (emphasis is mine):

    has no effect on the result

    The result and performance are two different things. Actually, there is an upvoted comment to the answer saying that

    But it might affect the query plan (=> performance)

    1. You can tell the optimiser which table to process first by using straight_join:

      STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.

    However, you need to be careful with that because you tie the optimiser's hand. See this SO topic on discussing advantages and disadvantages of straight_join.

    Number of records, where criteria, indexes - they all play their part in the optimiser's decision of the processing order of tables. There is no magic bullet, you need to play around a bit and probably you can trick the optimiser to change the order of the tables.