Search code examples
mysqlindexingmulti-index

Avoid filesort in simple filtered ordered query


I have a simple table:

CREATE TABLE `user_values` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `user_id` bigint NOT NULL,
  `value` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`,`id`),
  KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

that I am trying to execute the following simple query:

select * from user_values where user_id in (20020, 20030) order by id desc;

I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:

explain select * from user_values where user_id in (20020, 20030); yields:

id select_type table partitions type key key_len ref rows filtered Extra
1 SIMPLE user_values NULL range user_id 8 NULL 9 100.00 Using index condition; Using filesort

Why is that the case? How can I avoid a filesort on this trivial query?


Solution

  • You can't avoid the filesort in the query you show.

    When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.

    Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.

    id user_id
    1 20030
    2 20020
    3 20016
    4 20030
    5 20020

    So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:

    id user_id
    2 20020
    5 20020
    1 20030
    4 20030

    Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.

    The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.

    The only ways to avoid the filesort are to change the query in one of the following ways:

    Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.

    Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.