I'm running MySQL 5.7.30-33, and I'm hitting an issue that seems like MySQL is using the wrong index when running a query. I'm getting a 3 second query time using my existing query. However, just by changing the ORDER BY, removing the LIMIT, or forcing a USE INDEX I can get a 0.01 second query time. Unfortunately I need to stick with my original query (it's baked into an application), so it'd be great if this disparity could be resolved in the schema/indexing.
My table structure is as follows:
CREATE TABLE `referrals` (
`__id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`systemcreated` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrerid` mediumtext COLLATE utf8mb4_unicode_ci,
`referrersiteid` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
... lots more mediumtext fields ...
PRIMARY KEY (`__id`),
KEY `systemcreated` (`systemcreated`,`referrersiteid`,`__id`)
) ENGINE=InnoDB AUTO_INCREMENT=53368 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
The table only has ~55k rows, but is very wide, as some of the fields contain huge BLOBs:
mysql> show table status like 'referrals'\G;
*************************** 1. row ***************************
Name: referrals
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 45641
Avg_row_length: 767640
Data_length: 35035897856
Max_data_length: 0
Index_length: 3653632
Data_free: 3670016
Auto_increment: 54008
Create_time: 2020-12-12 12:46:14
Update_time: 2020-12-12 17:50:28
Check_time: NULL
Collation: utf8mb4_unicode_ci
Checksum: NULL
Create_options: row_format=COMPRESSED
Comment:
1 row in set (0.00 sec)
My customer's application queries the table using this, and unfortunately that can't easily be changed:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
This results in a query time around 3 seconds.
The EXPLAIN looks like this:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Note that it is using the PRIMARY key for the query rather than the systemcreated
index.
If I change the query to use ASC rather than DESC:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id asc
limit 16;
then it takes 0.01 seconds, and the EXPLAIN looks to be the same:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
If I change the query to stick with ORDER BY __id DESC, but remove the LIMIT:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc;
then it also takes 0.01 seconds, with an EXPLAIN like this:
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Alternatively, if I force the original query to use the systemcreated
index then it also gives a 0.01 sec query time. Here's the EXPLAIN:
mysql> explain SELECT *
FROM referrals USE INDEX (systemcreated)
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Lastly, if I use the original ORDER BY __id DESC LIMIT 16 but select fewer fields, then it also returns in 0.01 seconds! Here's the explain:
mysql> explain SELECT field1, field2, field3, field4, field5
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
So the only combination that seems to be performing poorly is ORDER BY __id DESC LIMIT 16
.
I think I have the indexes setup correctly. I'm querying via the systemcreated
and referrersiteid
fields, and ordering by __id, so I have an index defined as (systemcreated, referrersiteid, __id), but MySQL still seems to be using the PRIMARY key.
Any suggestions?
"Avg_row_length: 767640"; lots of MEDIUMTEXT
. A row is limited to about 8KB; overflow goes into "off-record" blocks. Reading those blocks takes extra disk hits.
SELECT *
will reach for all those fat columns. The total will be about 50 reads (of 16KB each). This takes time.
(Exp 4) SELECT a,b,c,d
ran faster because it did not need to fetch all ~50 blocks per row.
Your secondary index, (systemcreated
,referrersiteid
,__id
), -- only the first column is useful. This is because of systemcreated LIKE 'xxx%'
. This is a "range". Once a range is hit, the rest of the index is ineffective. Except...
"Index hints" (USE INDEX(...)
) may help today but may make things worse tomorrow when the data distribution changes.
If you can't get rid of the wild cards in LIKE
, I recommend these two indexes:
INDEX(systemcreated)
INDEX(referrersiteid)
The real speedup can occur by turning the query inside out. That is, find the 16 ids first, then go looking for all those bulky columns:
SELECT r2... -- whatever you want
FROM
(
SELECT __id
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16
) AS r1
JOIN referrals r2 USING(__id)
ORDER BY __id DESC -- yes, this needs repeating
And keep the 3-column secondary index that you have. Even if it must scan a lot more than 16 rows to find the 16 desired, it is a lot less bulky. This means that the subquery ("derived table") will be moderately fast. Then the outer query will still have 16 lookups -- possibly 16*50 blocks to read. The total number of blocks read will still be a lot less.
There is rarely a noticeable difference between ASC
and DESC
on ORDER BY
.
Why does the Optimizer pick the PK instead of the seemingly better secondary index? The PK might be best, especially if the 16 rows are at the 'end' (DESC) of the table. But that would be a terrible choice if it had to scan the entire table without finding 16 rows.
Meanwhile, the wildcard test makes the secondary index only partially useful. The Optimizer makes a decision based on inadequate statistics. Sometimes it feels like the flip of a coin.
If you use my inside-out reformulation, then I recommend the follow two composite indexes -- The Optimizer can make a semi-intelligent, semi-correct choice between them for the derived table:
INDEX(systemcreated, referrersiteid, __id),
INDEX(referrersiteid, systemcreated, __id)
It will continue to say "filesort", but don't worry; it's only sorting 16 rows.
And, remember, SELECT *
is hurting performance. (Though maybe you can't fix that.)