so I have two tables in mysql: articles and articles_rubrics, both with ~20.000 rows
article has multiple cols, but its article_id is indexed.
articles_rubrics has only two cols: article_id and rubrics_id and both are indexed separately and on top of that there is joined index of these two.
My issue is that when I select data from these tables with join, the order is extremely important, which is an issue for me and I don't understand the reason for it:
SELECT article_id,rubric_id FROM articles
LEFT JOIN articles_rubrics USING(article_id)
WHERE rubric_id=1
ORDER BY article_id DESC
LIMIT 10;
and explain says (for articles_rubrics) this:
time: 0.312 s
key_len: 1
ref: const
rows: 7352
extra: Using where; Using temporary; Using filesort
But when I switch the order of it:
SELECT article_id,rubric_id FROM articles_rubrics
LEFT JOIN articles USING(article_id)
WHERE rubric_id=1
ORDER BY article_id DESC
LIMIT 10;
and explain says (for articles_rubrics) this:
time: 0.001 s
key_len:9
ref: NULL
rows: 28
extra: Using where; Using index
So I have two tables and this makes their querying go ~300 times slower/faster. How is that even possible?
PS: I've heavily simplified my real world problem for this example, but I stumbled upon this because my
SELECT * FROM articles [LEFT JOIN for 5 other tables]
was taking 1.5s and when I actually added other join to the mix, execution time changed to like 0.006s.
Show index:
show index from articles;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
articles 0 PRIMARY 1 article_id A 20043 NULL NULL BTREE
articles 1 article_url_title 1 article_url_title A 10021 NULL NULL BTREE
articles 1 FULLTEXT 1 article_title NULL 1 NULL NULL FULLTEXT
articles 1 FULLTEXT 2 article_content NULL 1 NULL NULL FULLTEXT
show index from articles_rubrics;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type
articles_rubrics 0 PRIMARY 1 article_id A NULL NULL NULL BTREE
articles_rubrics 0 PRIMARY 2 rubric_id A 20814 NULL NULL BTREE
articles_rubrics 1 rubric_id 1 rubric_id A 17 NULL NULL BTREE
articles_rubrics 1 article_id 1 article_id A 20814 NULL NULL BTREE
These two queries run the same, only difference is in using article_id from articles_rubrics in both.
-- SELECT article_id,rubric_id FROM articles -- would be slow here
SELECT ar.article_id,ar.rubric_id FROM articles
JOIN articles_rubrics ar USING(article_id)
WHERE rubric_id=1
ORDER BY article_id DESC
LIMIT 10;
SELECT ar.article_id,ar.rubric_id FROM articles_rubrics ar
JOIN articles USING(article_id)
WHERE rubric_id=1
ORDER BY article_id DESC
LIMIT 10;
If I force the sql server to use articles_rubrics table in the result, he correctly decides that the articles aren't actually needed. The server, however, won't do that automatically, even though article_id is used as a key.
I still don't fully understand why it's happening (or how the optimization algorithm actually works), because in both cases, where rubric_id=1
goes into the articles_rubrics
table and in both cases, the selected columns are there already (and join articles
for existence is ran again, in both cases).
However, for some reason, in the first example, the server decides to load all the articles first and only then, he checks each one for the rubric_id
.