This is an example of "High Performance MySQL 3rd".
mysql> EXPLAIN SELECT * FROM products WHERE actor='SEAN CARREY' AND title like '%APOLLO%';
The book said that MySQL can't perform the LIKE as below.
MySQL can’t perform the LIKE operation in the index. This is a limitation of the low-level storage engine API, which in MySQL 5.5 and earlier allows only simple comparisons (such as equality, inequality, and greater-than) in index operations. MySQL can perform prefix-match LIKE patterns in the index because it can convert them to simple comparisons, but the leading wildcard in the query makes it impossible for the storage engine to evaluate the match. Thus, the MySQL server itself will have to fetch and match on the row’s values, not the index’s values.
After that, the book gave a "deferred join" improvement.
mysql> EXPLAIN SELECT * FROM products
-> JOIN (
-> SELECT prod_id FROM products WHERE actor='SEAN CARREY' AND title LIKE '%APOLLO%'
-> ) AS t1 ON (t1.prod_id=products.prod_id);
Even the (actor, title, prod_id) is a "covering index", MySQL can't also perform the LIKE in the index.
I am so confused!
This is an optimization that works around technical limitations about how MySQL works, and less about logic. Especially your understanding that you cannot use an index to directly find matches for a leading wildcard is correct.
The main problem is that a covering index in MySQL 5.5 technically didn't exactly do what you assume it does (and could do).
To read the quoted statement from the book properly, you have to know that there is a difference between The MySQL server and the underlying storage engine. The MySQL server takes your query, decides how to execute it, sends a request to the (InnoDB) storage engine via an api, and gets back some rows.
So for your first query, MySQL asks InnoDB to give it the following data: all columns (select *
), using an index to look for actor='SEAN CARREY'
. Although it would be nice and you assume that a covering index would do this, unfortunately, it cannot also directly eliminate rows based on title like '%APOLLO%'
, because
This is a limitation of the low-level storage engine API, which in MySQL 5.5 and earlier allows only simple comparisons (such as equality, inequality, and greater-than) in index operations.
Since you asked for *
, it retrieves all columns, which requires to look into the table data, for all rows with the correct actor (using the index) from the InnoDB engine and then filter those afterwards, since
the MySQL server itself will have to fetch and match on the row’s values, not the index’s values.
In the second query, the MySQL server only needs prod_id
(as per request) and title
(to do the where
comparison) from the storage engine. This is now actually covered by the index! Although the upper layer still needs to do the evaluation on title like '%APOLLO%'
, the storage engine now does not need to read the actual table data to fulfill the request for the subquery.
The MySQL server can now evaluate the data it received and send another request to the storage engine to retrieve all columns for the prod_id
that fulfill the where
-condition. In extreme cases this might not filter at all (e.g. every row with actor='SEAN CARREY'
could also fulfill title like '%APOLLO%'
), and then the deferred join could be a bit slower, since you do more work overall.
You think this is not what a covering index should do? You are right. And MySQL 5.6 learned how to do it more properly:
Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine.
[...]
MySQL can use the index to scan through people with
zipcode='95054'
. The second part (lastname LIKE '%etrunia%'
) cannot be used to limit the number of rows that must be scanned, so without Index Condition Pushdown, this query must retrieve full table rows for all people who havezipcode='95054'
.With Index Condition Pushdown, MySQL checks the
lastname LIKE '%etrunia%'
part before reading the full table row. This avoids reading full rows corresponding to index tuples that match the zipcode condition but not the lastname condition.
Since it was only needed to workaround technical issues, you do not need a defered join here anymore (although you should not forget about it, it can be useful in other situations). Your explain output for your first query should now include
Using index condition
(JSON property: using_index_condition)Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary. See Section 8.2.1.5, “Index Condition Pushdown Optimization”.