Search code examples
mysqlindexinginnodb

MySQL query is slow only when using ORDER BY field DESC and LIMIT


Overview

I'm running MySQL 5.7.30-33, and I'm hitting an issue that seems like MySQL is using the wrong index when running a query. I'm getting a 3 second query time using my existing query. However, just by changing the ORDER BY, removing the LIMIT, or forcing a USE INDEX I can get a 0.01 second query time. Unfortunately I need to stick with my original query (it's baked into an application), so it'd be great if this disparity could be resolved in the schema/indexing.

Setup / problem

My table structure is as follows:

CREATE TABLE `referrals` (
  `__id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `systemcreated` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `referrerid` mediumtext COLLATE utf8mb4_unicode_ci,
  `referrersiteid` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  ... lots more mediumtext fields ...
  PRIMARY KEY (`__id`),
  KEY `systemcreated` (`systemcreated`,`referrersiteid`,`__id`)
) ENGINE=InnoDB AUTO_INCREMENT=53368 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

The table only has ~55k rows, but is very wide, as some of the fields contain huge BLOBs:

mysql> show table status like 'referrals'\G;
*************************** 1. row ***************************
           Name: referrals
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 45641
 Avg_row_length: 767640
    Data_length: 35035897856
Max_data_length: 0
   Index_length: 3653632
      Data_free: 3670016
 Auto_increment: 54008
    Create_time: 2020-12-12 12:46:14
    Update_time: 2020-12-12 17:50:28
     Check_time: NULL
      Collation: utf8mb4_unicode_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED
        Comment: 
1 row in set (0.00 sec)

My customer's application queries the table using this, and unfortunately that can't easily be changed:

SELECT  *
    FROM  referrals
    WHERE  `systemcreated` LIKE 'XXXXXX%'
      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
    order by  __id desc
    limit  16;

This results in a query time around 3 seconds.

The EXPLAIN looks like this:

+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table       | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | referrals   | NULL       | index | systemcreated | PRIMARY | 4       | NULL |   32 |     5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Note that it is using the PRIMARY key for the query rather than the systemcreated index.

Experimentation 1

If I change the query to use ASC rather than DESC:

SELECT  *
    FROM  referrals
    WHERE  `systemcreated` LIKE 'XXXXXX%'
      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
    order by  __id asc
    limit  16;

then it takes 0.01 seconds, and the EXPLAIN looks to be the same:

+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table       | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | referrals   | NULL       | index | systemcreated | PRIMARY | 4       | NULL |   32 |     5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Experimentation 2

If I change the query to stick with ORDER BY __id DESC, but remove the LIMIT:

SELECT  *
    FROM  referrals
    WHERE  `systemcreated` LIKE 'XXXXXX%'
      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
    order by  __id desc;

then it also takes 0.01 seconds, with an EXPLAIN like this:

+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table       | partitions | type  | possible_keys | key           | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | referrals   | NULL       | range | systemcreated | systemcreated | 406     | NULL | 2086 |    11.11 | Using index condition; Using filesort |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+

Experimentation 3

Alternatively, if I force the original query to use the systemcreated index then it also gives a 0.01 sec query time. Here's the EXPLAIN:

mysql> explain     SELECT  *
    FROM  referrals USE INDEX (systemcreated)
    WHERE  `systemcreated` LIKE 'XXXXXX%'
      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
    order by  __id desc
    limit  16;

+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table        | partitions | type  | possible_keys | key           | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | referrals    | NULL       | range | systemcreated | systemcreated | 406     | NULL | 2086 |    11.11 | Using index condition; Using filesort |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+

Experimentation 4

Lastly, if I use the original ORDER BY __id DESC LIMIT 16 but select fewer fields, then it also returns in 0.01 seconds! Here's the explain:

mysql> explain     SELECT  field1, field2, field3, field4, field5
    FROM  referrals
    WHERE  `systemcreated` LIKE 'XXXXXX%'
      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
    order by  __id desc
    limit  16;

+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table       | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | referrals   | NULL       | index | systemcreated | PRIMARY | 4       | NULL |   32 |     5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Summary

So the only combination that seems to be performing poorly is ORDER BY __id DESC LIMIT 16.

I think I have the indexes setup correctly. I'm querying via the systemcreated and referrersiteid fields, and ordering by __id, so I have an index defined as (systemcreated, referrersiteid, __id), but MySQL still seems to be using the PRIMARY key.

Any suggestions?


Solution

    • "Avg_row_length: 767640"; lots of MEDIUMTEXT. A row is limited to about 8KB; overflow goes into "off-record" blocks. Reading those blocks takes extra disk hits.

    • SELECT * will reach for all those fat columns. The total will be about 50 reads (of 16KB each). This takes time.

    • (Exp 4) SELECT a,b,c,d ran faster because it did not need to fetch all ~50 blocks per row.

    • Your secondary index, (systemcreated,referrersiteid,__id), -- only the first column is useful. This is because of systemcreated LIKE 'xxx%'. This is a "range". Once a range is hit, the rest of the index is ineffective. Except...

    • "Index hints" (USE INDEX(...)) may help today but may make things worse tomorrow when the data distribution changes.

    • If you can't get rid of the wild cards in LIKE, I recommend these two indexes:

        INDEX(systemcreated)
        INDEX(referrersiteid)
      
    • The real speedup can occur by turning the query inside out. That is, find the 16 ids first, then go looking for all those bulky columns:

        SELECT  r2...   -- whatever you want
            FROM  
            (
                SELECT  __id
                    FROM  referrals
                    WHERE  `systemcreated` LIKE 'XXXXXX%'
                      AND  `referrersiteid` LIKE 'XXXXXXXXXXXX%'
                    order by  __id desc
                    limit  16 
            ) AS r1
            JOIN  referrals r2 USING(__id)
            ORDER BY  __id DESC   -- yes, this needs repeating 
      

    And keep the 3-column secondary index that you have. Even if it must scan a lot more than 16 rows to find the 16 desired, it is a lot less bulky. This means that the subquery ("derived table") will be moderately fast. Then the outer query will still have 16 lookups -- possibly 16*50 blocks to read. The total number of blocks read will still be a lot less.

    There is rarely a noticeable difference between ASC and DESC on ORDER BY.

    Why does the Optimizer pick the PK instead of the seemingly better secondary index? The PK might be best, especially if the 16 rows are at the 'end' (DESC) of the table. But that would be a terrible choice if it had to scan the entire table without finding 16 rows.

    Meanwhile, the wildcard test makes the secondary index only partially useful. The Optimizer makes a decision based on inadequate statistics. Sometimes it feels like the flip of a coin.

    If you use my inside-out reformulation, then I recommend the follow two composite indexes -- The Optimizer can make a semi-intelligent, semi-correct choice between them for the derived table:

    INDEX(systemcreated, referrersiteid, __id),
    INDEX(referrersiteid, systemcreated, __id)
    

    It will continue to say "filesort", but don't worry; it's only sorting 16 rows.

    And, remember, SELECT * is hurting performance. (Though maybe you can't fix that.)