Search code examples
mysqlsqlindexingdatabase-performancequery-performance

MySQL select distinct query not using index


I have a table clicks:

CREATE TABLE `clicks` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `link_id` int(11) NOT NULL,
  `date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
  PRIMARY KEY (`id`),
  KEY `link_id` (`link_id`),
  KEY `date_added` (`date_added`)
) ENGINE=InnoDB AUTO_INCREMENT=90899051 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

With the following indexes:

+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| clicks |          0 | PRIMARY    |            1 | id          | A         |    79808649 |     NULL | NULL   |      | BTREE      |         |               |
| clicks |          1 | link_id    |            1 | link_id     | A         |      276154 |     NULL | NULL   |      | BTREE      |         |               |
| clicks |          1 | date_added |            1 | date_added  | A         |    79808649 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I'm trying to run some analytical queries on this table and I'm finding that it's taking ages to run. Take the following query for example:

SELECT
    DISTINCT(link_id) AS link_id
FROM
    clicks
WHERE
    date_added >= '2016-11-01 00:00:00'
AND date_added <= '2016-12-05 10:16:00'

This query took almost a minute to complete. I found by running an EXPLAIN on the query that the indexes weren't used.

+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | clicks | index | date_added    | link_id | 4       | NULL | 79786609 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+

I expected that the query would have been run by using the index on the date_added column to filter down the result set before then pulling out the distinct link_ids from the result.

Does anyone know why the index is not being used or if there's anything I can do to force it to be used?

Note: This question is part of a bigger issue and is closely related to an unsolved issue I posted last week - MySQL query with JOIN not using INDEX


Edit

Explain on my query, without using any index hints:

EXPLAIN SELECT DISTINCT(link_id) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table                     | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | clicks                    | index | date_added    | link_id | 4       | NULL | 79816660 | Using where |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+

Explain on my query WITH index hints:

EXPLAIN SELECT DISTINCT(link_id) FROM clicks USE INDEX(date_added) IGNORE INDEX(link_id) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
| id | select_type | table                     | type | possible_keys | key  | key_len | ref  | rows     | Extra                        |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
|  1 | SIMPLE      | clicks                    | ALL  | date_added    | NULL | NULL    | NULL | 79816882 | Using where; Using temporary |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+

Edit 2

Using FORCE INDEX(date_added) on my query (Query completed even quicker, 12.05 sec):

EXPLAIN SELECT DISTINCT(link_id) FROM clicks FORCE INDEX(date_added) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
| id | select_type | table                     | type  | possible_keys | key        | key_len | ref  | rows     | Extra                        |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
|  1 | SIMPLE      | clicks                    | range | date_added    | date_added | 4       | NULL | 17277508 | Using where; Using temporary |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+

Solution

  • First of all, it is not true that no index was used. The explain results clearly indicate that link_id index was used.

    As to why the date_added index was not used, the answer is simple: MySQL decided against because it thought that link_id index would be a better choice.

    What you need to understand is that there are 2 operations in your query that could be sped-up by an index:

    1. Filtering on the data range
    2. Making sure that each link_id is returned only once.

    If you really wanted to optimise this query, then you would create a multi-column index on link_id, date_added fields. You wrote in a comment to Gordon's answer, that this was not possible.

    Therefore, MySQL must decide which one of the 2 operations it is going to speed up using one of the indexes. It decided that link_id index is the better choice. If you disagree with its decision, then you can use index hints to indicate to MySQL that you prefer to use (use index or force index) or ignore (ignore index) a particular index.

    Just instruct MySQL to ignore the link_id index and use date_added index. Just make sure that you check the query speed after the adjustments.