Search code examples
sqlmysqlperformancequery-optimization

MySQL not using an index when using SELECT *


I'm having some trouble adding a new index to my table and I'm wondering why MySQL isn't using the new index.

I noticed that it only happens when my SELECT statement requests other colums which aren't part of the index (e.g. SELECT * doesn't work while SELECT id, otherId does work).

Can someone explain to me why MySQL chooses to do a table scan instead of using the index?

Database schema

The table consists of multiple columns, namely id and otherId. id is my primary key while otherId should get indexed as well. otherId can be null and is unique.

model Entity {
    id      String  @id @default(cuid())
    otherId String? @unique
    // Few more ...
}

Cardinality and selectivity

SELECT
  COUNT(DISTINCT Entity.otherId) as cardinality,
  COUNT(*) as totalRows,
  COUNT(Entity.otherId) as nonNullRows,
  COUNT(DISTINCT Entity.otherId) / COUNT(*) as selectivity
FROM
  Entity
cardinality totalRows nonNullRows selectivity
171 1187 171 0.1441

EXPLAIN SELECT *

This one doesn't use the index but I wish it would.

EXPLAIN SELECT * FROM Entity WHERE Entity.otherId = ?;
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE Entity ALL Entity_otherId_key 1187 10 Using where

EXPLAIN SELECT id, otherId

This one does use the index.

EXPLAIN SELECT id, otherId FROM Entity WHERE Entity.otherId = ?;
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE Entity index Entity_otherId_key Entity_otherId_key 767 1187 10 Using where; Using index

SHOW INDEXES

SHOW INDEXES from Entity;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment Visible Expression
Entity 0 PRIMARY 1 id A 1187 BTREE YES
Entity 0 Entity_otherId_key 1 otherId A 172 YES BTREE YES

MySQL version

8.2.0


Solution

  • If a query does no row restriction (WHERE clause) or sorting (ORDER BY clause), it is bound to examine every row.

    This is supported by the rows column of the two EXPLAIN results. It show 1187 even in the second example.

    But what's the difference between type: ALL and type: index?

    The first is a table-scan. The query will examine every row in the table. It must do this to count them, because of MVCC. Thus the cost of the query is directly proportional to the number of rows.

    The second is an index-scan. This is slightly better than a table-scan, but it still needs to examine every entry in the named index to count them. The cost is also directly proportional to the number of rows, but it may be able to examine fewer pages to read the index alone instead of the whole table.

    The note in the EXPLAIN "Using index" leads you to believe it's optimized. Strictly speaking it's true that it's using an index for this query, but it's not using the index in the same way that it would for searching or sorting.


    In cases where a query might use an index for a row restriction, the optimizer may choose not to do so. A common case is that the specific value you search for is estimated to occur on a large proportion of rows. The optimizer decides that doing an index lookup for that value and then a second lookup to get the row is likely to be more costly than just doing the table-scan, as if no index were present.

    We can think of the index at the back of a book. Why are common words not indexed? Because it would be needless overhead to look up a common word that occurs on many pages. It's just as much work (i.e. turning pages of the book) to read the book cover-to-cover.

    In the case of MySQL's optimizer, if the optimizer thinks the value you search for occurs on about 20% of the rows of the table, then it is likely to skip the index. This threshold is not documented, and may be affected by other optimizer analysis. It's just a rough threshold I've observed.