sql mysql performance query-optimization

MySQL not using an index when using SELECT *

I'm having some trouble adding a new index to my table and I'm wondering why MySQL isn't using the new index.

I noticed that it only happens when my SELECT statement requests other colums which aren't part of the index (e.g. SELECT * doesn't work while SELECT id, otherId does work).

Can someone explain to me why MySQL chooses to do a table scan instead of using the index?

Database schema

The table consists of multiple columns, namely id and otherId. id is my primary key while otherId should get indexed as well. otherId can be null and is unique.

model Entity {
    id      String  @id @default(cuid())
    otherId String? @unique
    // Few more ...
}

Cardinality and selectivity

SELECT
  COUNT(DISTINCT Entity.otherId) as cardinality,
  COUNT(*) as totalRows,
  COUNT(Entity.otherId) as nonNullRows,
  COUNT(DISTINCT Entity.otherId) / COUNT(*) as selectivity
FROM
  Entity

cardinality	totalRows	nonNullRows	selectivity
171	1187	171	0.1441

EXPLAIN SELECT *

This one doesn't use the index but I wish it would.

EXPLAIN SELECT * FROM Entity WHERE Entity.otherId = ?;

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	Entity		ALL	Entity_otherId_key				1187	10	Using where

EXPLAIN SELECT id, otherId

This one does use the index.

EXPLAIN SELECT id, otherId FROM Entity WHERE Entity.otherId = ?;

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	Entity		index	Entity_otherId_key	Entity_otherId_key	767		1187	10	Using where; Using index

SHOW INDEXES

SHOW INDEXES from Entity;

Table	Non_unique	Key_name	Seq_in_index	Column_name	Collation	Cardinality	Sub_part	Packed	Null	Index_type	Comment	Index_comment	Visible	Expression
Entity	0	PRIMARY	1	id	A	1187				BTREE			YES
Entity	0	Entity_otherId_key	1	otherId	A	172			YES	BTREE			YES

MySQL version

8.2.0

Solution

If a query does no row restriction (WHERE clause) or sorting (ORDER BY clause), it is bound to examine every row.

This is supported by the rows column of the two EXPLAIN results. It show 1187 even in the second example.

But what's the difference between type: ALL and type: index?

The first is a table-scan. The query will examine every row in the table. It must do this to count them, because of MVCC. Thus the cost of the query is directly proportional to the number of rows.

The second is an index-scan. This is slightly better than a table-scan, but it still needs to examine every entry in the named index to count them. The cost is also directly proportional to the number of rows, but it may be able to examine fewer pages to read the index alone instead of the whole table.

The note in the EXPLAIN "Using index" leads you to believe it's optimized. Strictly speaking it's true that it's using an index for this query, but it's not using the index in the same way that it would for searching or sorting.

In cases where a query might use an index for a row restriction, the optimizer may choose not to do so. A common case is that the specific value you search for is estimated to occur on a large proportion of rows. The optimizer decides that doing an index lookup for that value and then a second lookup to get the row is likely to be more costly than just doing the table-scan, as if no index were present.

We can think of the index at the back of a book. Why are common words not indexed? Because it would be needless overhead to look up a common word that occurs on many pages. It's just as much work (i.e. turning pages of the book) to read the book cover-to-cover.

In the case of MySQL's optimizer, if the optimizer thinks the value you search for occurs on about 20% of the rows of the table, then it is likely to skip the index. This threshold is not documented, and may be affected by other optimizer analysis. It's just a rough threshold I've observed.