I have a table clicks
:
CREATE TABLE `clicks` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`link_id` int(11) NOT NULL,
`date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
PRIMARY KEY (`id`),
KEY `link_id` (`link_id`),
KEY `date_added` (`date_added`)
) ENGINE=InnoDB AUTO_INCREMENT=90899051 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
With the following indexes:
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| clicks | 0 | PRIMARY | 1 | id | A | 79808649 | NULL | NULL | | BTREE | | |
| clicks | 1 | link_id | 1 | link_id | A | 276154 | NULL | NULL | | BTREE | | |
| clicks | 1 | date_added | 1 | date_added | A | 79808649 | NULL | NULL | | BTREE | | |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I'm trying to run some analytical queries on this table and I'm finding that it's taking ages to run. Take the following query for example:
SELECT
DISTINCT(link_id) AS link_id
FROM
clicks
WHERE
date_added >= '2016-11-01 00:00:00'
AND date_added <= '2016-12-05 10:16:00'
This query took almost a minute to complete. I found by running an EXPLAIN
on the query that the indexes weren't used.
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | clicks | index | date_added | link_id | 4 | NULL | 79786609 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
I expected that the query would have been run by using the index on the date_added
column to filter down the result set before then pulling out the distinct link_id
s from the result.
Does anyone know why the index is not being used or if there's anything I can do to force it to be used?
Note: This question is part of a bigger issue and is closely related to an unsolved issue I posted last week - MySQL query with JOIN not using INDEX
Edit
Explain on my query, without using any index hints:
EXPLAIN SELECT DISTINCT(link_id) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | clicks | index | date_added | link_id | 4 | NULL | 79816660 | Using where |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
Explain on my query WITH index hints:
EXPLAIN SELECT DISTINCT(link_id) FROM clicks USE INDEX(date_added) IGNORE INDEX(link_id) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
| 1 | SIMPLE | clicks | ALL | date_added | NULL | NULL | NULL | 79816882 | Using where; Using temporary |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
Edit 2
Using FORCE INDEX(date_added)
on my query (Query completed even quicker, 12.05 sec):
EXPLAIN SELECT DISTINCT(link_id) FROM clicks FORCE INDEX(date_added) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
| 1 | SIMPLE | clicks | range | date_added | date_added | 4 | NULL | 17277508 | Using where; Using temporary |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
First of all, it is not true that no index was used. The explain
results clearly indicate that link_id
index was used.
As to why the date_added
index was not used, the answer is simple: MySQL decided against because it thought that link_id
index would be a better choice.
What you need to understand is that there are 2 operations in your query that could be sped-up by an index:
link_id
is returned only once.If you really wanted to optimise this query, then you would create a multi-column index on link_id, date_added
fields. You wrote in a comment to Gordon's answer, that this was not possible.
Therefore, MySQL must decide which one of the 2 operations it is going to speed up using one of the indexes. It decided that link_id
index is the better choice. If you disagree with its decision, then you can use index hints to indicate to MySQL that you prefer to use (use index
or force index
) or ignore (ignore index
) a particular index.
Just instruct MySQL to ignore the link_id
index and use date_added
index. Just make sure that you check the query speed after the adjustments.