I am using using MySQL (InnoDB engine). I am running SQL query with WHERE IN
statement which is also ordered.
I decided to create the perfect index, so my query will not run slow. Surprisingly, after creating the perfect index MySQL is still using filesort when running my query.
My Query:
SELECT *
FROM `events`
WHERE
`events`.`user_id` IN (118, 105, 126, 196, 338)
ORDER BY
`date` ASC,
`planing_hour_id` ASC;
My perfect Index:
ALTER TABLE `events`
ADD INDEX `my_perfect_index` (
`user_id`,
`date` ASC,
`planing_hour_id` ASC
);
Problem:
MySQL still uses filesort even when there is perfect index available. I would expect only Using Index
condition to be present in output of EXPLAIN
query. Why is this not the case?
Disclaimer:
I checked similar topics and none of them has been useful:
You can eliminate "Using filesort" if MySQL's natural order of reading the rows matches the order you request in your ORDER BY. Thus there is no sorting needed, because they are already read in the right order.
MySQL always reads in index order. Whether it uses a secondary index like yours, or the primary key (also called the clustered index), rows will be read in the order they are stored in that index. So if that order does not match your ORDER BY, then a filesort is required.
(Note "filesort" doesn't necessarily mean it will be on disk. If there are few rows in the result, the sorting can be done in RAM relatively quickly.)
So why doesn't your index count as a "perfect" index?
Think of this analogy:
SELECT ... FROM TelephoneBook
WHERE last_name IN ('Addams', 'Kirk', 'Smith')
ORDER BY first_name;
The result:
last_name | first_name |
---|---|
Addams | Morticia |
Kirk | James |
Smith | Sarah Jane |
These are read in index order by last_name
. But they are not implicitly in order by first_name
, the order requested in the query. The query needs a phase to filesort the set of matching rows.
If you were to make the "perfect" index this way:
ALTER TABLE `TelephoneBook`
ADD INDEX `my_perfect_index` (
`first_name`,
`last_name`
);
The query might use this index, and eliminate the filesort, because it's reading the rows in order by first_name
. However, this can't also do the search to find rows matching last_name
most efficiently.
The best it can do is index condition pushdown, which is to tell the storage engine not to return rows that don't match the index — but the storage engine does have to examine all the indexes to evaluate them. This is not as good for optimization as reading from an index leading with last_name
.
Ultimately, there is no perfect index in cases where the query must search with a range condition (IN()
qualifies as a range condition), and also do a sort that doesn't match the column of the range condition.