MySQL Index - not full table is in index

I have a simple InnoDB table with 1M+ rows and some simple indexes. I need to sort this table by first_public and id columns and get some of them, this is why I've indexed first_public column.

first_public is unique at the moment, but in real life it might be not.

mysql> desc table;
+--------------+-------------------------+------+-----+---------+----------------+
| Field        | Type                    | Null | Key | Default | Extra          |
+--------------+-------------------------+------+-----+---------+----------------+
| id           | bigint unsigned         | NO   | PRI | NULL    | auto_increment |
| name         | varchar(255)            | NO   |     | NULL    |                |
| id_category  | int                     | NO   | MUL | NULL    |                |
| active       | smallint                | NO   |     | NULL    |                |
| status       | enum('public','hidden') | NO   |     | NULL    |                |
| first_public | datetime                | YES  | MUL | NULL    |                |
| created_at   | timestamp               | YES  |     | NULL    |                |
| updated_at   | timestamp               | YES  |     | NULL    |                |
+--------------+-------------------------+------+-----+---------+----------------+
8 rows in set (0.06 sec)

it works well while I'm working with rows before 130000+

mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 130341;
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
| id | select_type | table  | partitions | type  | possible_keys | key                 | key_len | ref  | rows   | filtered | Extra                            |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
|  1 | SIMPLE      | table  | NULL       | index | NULL          | firstPublicDateIndx | 6       | NULL | 130365 |     5.00 | Using where; Backward index scan |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)

but when I try to get some next rows (with offset 140000+), it looks like MySQL don't use first_public column index at all.

mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 140341;
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                       |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
|  1 | SIMPLE      | table  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1133533 |     5.00 | Using where; Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

I tried to add first_public column in to select clause, but nothing changed. What I'm doing wrong?

Solution

MySQL's optimizer tries to estimate the cost of doing your query, to decide if it's worth using an index. Sometimes it compares the cost of using the index versus just reading the rows in order, and discarding the ones that don't belong in the result.

In this case, it decided that if you use an OFFSET greater than 140k, it gives up on using the index.

Keep in mind how OFFSET works. There's no way of looking up the location of an offset by an index. Indexes help to look up rows by value, not by position. So to do an OFFSET query, it has to examine all the rows from the first matching row on up. Then it discards the rows it examined up to the offset, and then counts out the enough rows to meet the LIMIT and returns those.

It's like if you wanted to read pages 500-510 in a book, but to do this, you had to read pages 1-499 first. Then when someone asks you to read pages 511-520, and you have to read pages 1-510 over again.

Eventually the offset gets to be so large that it's less expensive to read 14000 rows in a table-scan, than to read 14000 index entries + 14000 rows.

What?!? Is OFFSET really so expensive? Yes, it is. It's much more common to look up rows by value, so MySQL is optimized for that usage.

So if you can reimagine your pagination queries to look up rows by value instead of using LIMIT/OFFSET, you'll be much happier.

For example, suppose you read "page" 1000, and you see that the highest id value on that page is 13999. When the client requests the next page, you can do the query:

SELECT ... FROM mytable WHERE id > 13999 LIMIT 24;

This does the lookup by the value of id, which is optimized because it utilizes the primary key index. Then it reads just 24 rows and returns them (MySQL is at least smart enough to stop reading after it reaches the OFFSET + LIMIT rows).