Search code examples
mysqlperformanceselectinner-join

More efficient to use subquery before inner joins?


I'm just in the process of learning MYSQL, and have something I've been wondering about.

Let's take this simple scenario: A hypothetical website for taking online courses, comprised of 4 tables: Students, Teachers, Courses and Registrations (one entry per course that a student has registered for)

You can find the DB generation code on github.

While the provided DB is tiny for clarity, to keep it relevant to what I need help with, let's assume that this is with a large enough database where efficiency would be a real issue - let's say hundreds of thousands of students, teachers, etc.



As far as I understand with MYSQL, if we want a table of students being taught by 'Charles Darwin', one possible query would be this:

Method 1

SELECT Students.name FROM Teachers
INNER JOIN Courses ON Teachers.id = Courses.teacher_id
INNER JOIN Registrations ON Courses.id = Registrations.course_id
INNER JOIN Students ON Registrations.student_id = Students.id
WHERE Teachers.name = "Charles Darwin"

which does indeed return what we want.

+----------------+
| name           |
+----------------+
| John Doe       |
| Jamie Heineman |
| Claire Doe     |
+----------------+


So Here's my question:

With my (very) limited MYSQL knowledge, it seems to me that here we are JOIN-ing elements onto the teachers table, which could be quite large, while we are ultimately only after a single teacher, who we filter out at the very very end of the query.

My 'Intuition' Says that it would be much more efficient to first get a single row for the teacher we need, and then join the remaining stuff onto that instead:

Method 2

SELECT Students.name FROM (SELECT Teachers.id FROM Teachers WHERE Teachers.name = 
"Charles Darwin") as Teacher
INNER JOIN Courses ON Teacher.id = Courses.teacher_id
INNER JOIN Registrations ON Courses.id = Registrations.course_id
INNER JOIN Students ON Registrations.student_id = Students.id

But is that really the case? Assuming thousands of teachers and students, is this more efficient than the first query? It could be that MYSQL is smart enough to parse the method 1 query in such a way that it runs more efficiently.


Also, if anyone could suggest an even more efficient query, I would be quite interested to hear it too.

Note: I've read before to use EXPLAIN to figure out how efficient a query is, but I don't understand MYSQL well enough to be able to decipher the result. Any insight here would be much appreciated as well.


Solution

  • My 'Intuition' Says that it would be much more efficient to first get a single row for the teacher we need, and then join the remaining stuff onto that instead:

    You are getting a single row for teacher in method 1 by using the predicate Teachers.name = "Charles Darwin". The query optimiser should determine that it is more efficient to restrict the Teacher set using this predicate before joining the other tables.

    If you don't trust the optimiser or want to lessen the work it does you can even force the table read order by using SELECT STRAIGHT_JOIN ... or STRAIGHT_JOIN instead of INNER_JOIN to make sure that MySQL reads the tables in the order you have specified in the query.

    Your second query results in the same answer but may be less efficient because a temporary table is created for your teacher subquery.

    The EXPLAIN documentation is a good source on how to interpret the EXPLAIN output.