mysql sql indexing innodb database-indexes

MySQL: Force different access_type with an index for IN(...)

I'm trying to tune a very simple query:

select * from log where user_id in (...) order by id desc limit 25

I just want to show the last 25 events for a group of different users ids (around 40 IDs). This query takes around 50 seconds to run (>80 million records in the table).

By executing an EXPLAIN format=json I can see that the access_type is range. After some exploration, a learned that if I change the number of IDs to 9, the query planner will use another kind of access: index.

So I'm assuming that for a large number of IDs MySQL will do a range scan between the smaller and bigger id of the group, which could make sense if the IDs were 'close', which is not always the case. Maybe somehow, this volume of extra data, becomes a problem when doing the sorting (as shown in the explain plan bellow).

40 IDs Explain

{
  "query_block": {
    "select_id": 1,
    "ordering_operation": {
      "using_filesort": true,
      "table": {
        "table_name": "log",
        "access_type": "range",
        "possible_keys": [
          "app_log_user_id"
        ],
        "key": "log_user_id",
        "used_key_parts": [
          "user_id"
        ],
        "key_length": "4",
        "rows": 6150,
        "filtered": 100,
        "index_condition": "(`app`.`log`.`user_id` in (<43 different ids from 12000 to 330000>))"
      }
    }
  }
}

9 IDs Explain

{
  "query_block": {
    "select_id": 1,
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "log",
        "access_type": "index",
        "possible_keys": [
          "app_log_user_id"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "id"
        ],
        "key_length": "4",
        "rows": 6901,
        "filtered": 4552.8,
        "attached_condition": "(`app`.`log`.`user_id` in (< 9 ids from 12000 to 18000))"
      }
    }
  }
}

I did an experiment: I divided that query in 5 other subqueries with only 9 or less IDs and applied an UNION to all of them, finishing with the ORDER and LIMIT clause at the end. The query plan for this query became a little mess, even with strange values saying that the number of searched rows for one of the subqueries would be 86737713 (which I think it an very wrong estimate, all the other were around 10246). Guess what? The query took "only" 6 seconds, better than 50.

I don't know which strategies are used to optimize this kind of query, but from my humble knowledge it seemed that if I could tell the Optimizer to use an acess_type of index instead range, it would perform better. Is that possible?

Extra Details

user_id have a foreign key and an index.
We use MySQL 5.6 (InnoDB)
Table have around 80kk rows.

SHOW CREATE TABLE

CREATE TABLE `app_log` (
  `id`              int(11)          NOT NULL AUTO_INCREMENT,
  `timestamp`       datetime         NOT NULL,
  `user_id`         int(11)          NOT NULL,
  `content_type_id` int(11)          NOT NULL,
  `object_id`       int(10) unsigned NOT NULL,
  `status`          int(11)                   DEFAULT NULL,
  `type`            int(11)                   DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `app_log_content_type_id` (`content_type_id`),
  KEY `app_log_144dd2a9` (`timestamp`),
  KEY `app_log_user_id` (`user_id`, `id`)
)
  ENGINE = InnoDB
  AUTO_INCREMENT = 108628300
  DEFAULT CHARSET = latin1

Solution

The likely explanation:

What version of MySQL/MariaDB are you using? I would guess you are using MySQL 5.6? (Your use of FORMAT=JSON confirms "at least 5.6.5".)

5.6.5 Introduce eq_range_index_dive_limit, with a default of 10.
5.7.4 eq_range_index_dive_limit default raised to 200 - affects IN()

The likely workaround:

This note possibly explains your 9 vs 43 items in the IN list. Suggest you play with eq_range_index_dive_limit.

Trivia

KK = thousand-thousand
M, to accountants = 'mille' = thousand
MM, to accountants = million, a la KK
Lakh, to Indians = 100K
Crore, to Indians = 10M (10 million)
Billion, to Brits used to mean million-million; fortunately, that confusion seems to have gone away.
The distinction between 1000 and 1024 (and KB vs KiB), etc, can, for all practical purposes, be ignored in this Forum.