I'm trying to tune a very simple query:
select * from log where user_id in (...) order by id desc limit 25
I just want to show the last 25 events for a group of different users ids (around 40 IDs). This query takes around 50 seconds to run (>80 million records in the table).
By executing an EXPLAIN format=json
I can see that the access_type
is range
. After some exploration, a learned that if I change the number of IDs to 9, the query planner will use another kind of access: index
.
So I'm assuming that for a large number of IDs MySQL will do a range scan between the smaller and bigger id of the group, which could make sense if the IDs were 'close', which is not always the case. Maybe somehow, this volume of extra data, becomes a problem when doing the sorting (as shown in the explain plan bellow).
40 IDs Explain
{
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_filesort": true,
"table": {
"table_name": "log",
"access_type": "range",
"possible_keys": [
"app_log_user_id"
],
"key": "log_user_id",
"used_key_parts": [
"user_id"
],
"key_length": "4",
"rows": 6150,
"filtered": 100,
"index_condition": "(`app`.`log`.`user_id` in (<43 different ids from 12000 to 330000>))"
}
}
}
}
9 IDs Explain
{
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "log",
"access_type": "index",
"possible_keys": [
"app_log_user_id"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "4",
"rows": 6901,
"filtered": 4552.8,
"attached_condition": "(`app`.`log`.`user_id` in (< 9 ids from 12000 to 18000))"
}
}
}
}
I did an experiment: I divided that query in 5 other subqueries with only 9 or less IDs and applied an UNION
to all of them, finishing with the ORDER and LIMIT clause at the end. The query plan for this query became a little mess, even with strange values saying that the number of searched rows for one of the subqueries would be 86737713 (which I think it an very wrong estimate, all the other were around 10246). Guess what? The query took "only" 6 seconds, better than 50.
I don't know which strategies are used to optimize this kind of query, but from my humble knowledge it seemed that if I could tell the Optimizer to use an acess_type
of index
instead range
, it would perform better. Is that possible?
Extra Details
user_id
have a foreign key and an index.SHOW CREATE TABLE
CREATE TABLE `app_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`user_id` int(11) NOT NULL,
`content_type_id` int(11) NOT NULL,
`object_id` int(10) unsigned NOT NULL,
`status` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `app_log_content_type_id` (`content_type_id`),
KEY `app_log_144dd2a9` (`timestamp`),
KEY `app_log_user_id` (`user_id`, `id`)
)
ENGINE = InnoDB
AUTO_INCREMENT = 108628300
DEFAULT CHARSET = latin1
The likely explanation:
What version of MySQL/MariaDB are you using? I would guess you are using MySQL 5.6? (Your use of FORMAT=JSON
confirms "at least 5.6.5".)
eq_range_index_dive_limit
, with a default of 10.eq_range_index_dive_limit
default raised to 200 - affects IN()
The likely workaround:
This note possibly explains your 9 vs 43 items in the IN
list. Suggest you play with eq_range_index_dive_limit
.
Trivia
KK = thousand-thousand
M, to accountants = 'mille' = thousand
MM, to accountants = million, a la KK
Lakh, to Indians = 100K
Crore, to Indians = 10M (10 million)
Billion, to Brits used to mean million-million; fortunately, that confusion seems to have gone away.
The distinction between 1000 and 1024 (and KB vs KiB), etc, can, for all practical purposes, be ignored in this Forum.