Search code examples
mysqlperformanceinnodb

Query performance on primary index vs index


I have a table on mysql and two queries whose performances are quite different. I have extracted plans of the queries, but I couldn't fully understand the reason behind the performance difference.

The table:

+-------------+----------------------------------------------+------------------------------------+
|   TableA    |                                              |                                    |
+-------------+----------------------------------------------+------------------------------------+
| id          | int(10)   unsigned NOT NULL AUTO_INCREMENT   |                                    |
| userId      | int(10)                                      | unsigned DEFAULT NULL              |
| created     | timestamp                                    | NOT NULL DEFAULT CURRENT_TIMESTAMP |
| PRIMARY KEY | id                                           |                                    |
| KEY userId  | userId                                       |                                    |
| KEY created | created                                      |                                    |
+-------------+----------------------------------------------+------------------------------------+

Keys/Indices: The primary key on id field, a key on userId field ASC , another key on created field ASC.

tableA is a very big table, it contains millions of rows.

The query I run on this table is:

The user with id 1234 has 1.5M records in this table. I want to fetch its latest 100 rows. In order to achieve this, I have 2 different queries:

Query 1:

SELECT * FROM tableA USE INDEX (userId) 
WHERE userId=1234 ORDER BY created DESC LIMIT 100;

Query 2:

SELECT * FROM tableA 
WHERE userId=1234 ORDER BY id DESC LIMIT 100;

Since id field of tableA is auto increment, the condition of being latest is preserved. These 2 queries return the same result. However, there is a huge performance difference.

Query plans are:

+----------+-----------------------------------------------+-------------------------------+------+---------------------------------------+
| Query No |                   Operation                   |            Params             | Raws |               Raw desc                |
+----------+-----------------------------------------------+-------------------------------+------+---------------------------------------+
| Query 1  | Sort(using file sort) Unique index scan (ref) | table: tableA; index: userId; | 2.5M | Using index condition; Using filesort |
| Query 2  | Unique index scan (ref)                       | table: tableA; index: userId; | 2.5M | Using where                           |
+----------+-----------------------------------------------+-------------------------------+------+---------------------------------------+


+--------+-------------+
|        | Performance |
+--------+-------------+
| Query1 | 7,5 s       |
+--------+-------------+
| Query2 | 741 ms      |
+--------+-------------+

I understand that there is a sorting operation on Query 1. In each query, the index used is userId. But why is sorting not used in Query 2? How does the primary index affect?

Mysql 5.7

Edit: There are more columns on the table, I have extracted them from the table definition above.


Solution

  • Since id field of tableA is auto increment, the condition of being latest is preserved.

    That is usually a valid statement.

    WHERE userId=1234 ORDER BY created DESC LIMIT 100
    

    needs this 'composite' index: (userId, created). With that, it will hit only 100 rows, regardless of the table size or the number of rows for that user.

    The same goes for

    WHERE userId=1234 ORDER BY id DESC LIMIT 100;
    

    Namely that it needs (userId, id). However, in InnoDB, when you say INDEX(x) it silently tacks on the PRIMARY KEY columns. So you effectively get INDEX(x,id). This is why your plain INDEX(userId) worked well.

    EXPLAIN rarely (if ever) takes into account the LIMIT. This is why 'Rows' is "2.5M" for both queries.

    The first query might (or might not) have used INDEX(userId) if you took out the USE INDEX hint. The choice depends on what percentage of the table has userId = 1234. If it is less than about 20%, the index would be used. But it would bounce back and forth between the secondary index and the data -- all 1.5 million times. If more than 20%, it would avoid the bouncing by simply reading all the "millions" of rows, ignoring those that don't apply.

    Note: What you had for Q1 will still read at least 1.5M rows, sort them ("Using filesort"), then peel off the desired 100. But with INDEX(userId, created), it can skip the sort and look at only 100 rows.

    I cannot explain "Unique index scan" without seeing SHOW CREATE TABLE and the un-annotated EXPLAIN. (EXPLAIN FORMAT=JSON SELECT... might provide more insight.)