Search code examples
mysqlsqlperformance

How can I optimise this mysql query that includes a where clause with an epoch time range?


I am trying to optimize the following mysql query:

SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
FROM events
WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000

Here are the events table details:

CREATE TABLE `events` (
  `id` char(36) NOT NULL,
  `tracking_id` char(72) NOT NULL,
  `event_time` bigint(16) NOT NULL,
  `server_id` char(36) NOT NULL,
  `project_id` char(36) NOT NULL,
  `data_type_id` char(36) NOT NULL,
  `event_type_id` char(36) NOT NULL,
  PRIMARY KEY (`tracking_id`,`event_time`),
  KEY `id_idx` (`id`),
  KEY `server_id_idx` (`server_id`),
  KEY `event_type_id_idx` (`event_type_id`),
  KEY `event_time_idx` (`event_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And the Explain output:

+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys  | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | events | NULL       | ALL  | event_time_idx | NULL | NULL    | NULL | 2877592 |    37.48 | Using where |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+

The query takes about 30 seconds to run. And adding an index on event_time doesn't seem to have made any difference to the execution time - It doesn't look like the index is being used?

event_time was originally a char (36) but then I was getting the following warning: 'Cannot use range access on index 'event_time_idx' due to type or collation conversion on field 'event_time' which has dissappeared since I converted event_time to a bigint, but it's still not using the index.

What can I do to improve the performance of this query (which is actually a subquery in a much larger query)?


Solution

  • Do all the rows in your table, or at least a majority of them, match the condition? In other words, the timestamps you give are from 2019-08-01 00:00:00 to 2019-08-31 23:59:59, so one full month. Are most of the rows currently in your table from this month?

    MySQL does cost-based optimization. It estimates the cost of reading an index entry, then using that to look up a row. This means two lookups per index entry, plus some overhead.

    MySQL is correct to estimate that a table-scan might be better than using an index in certain cases. The threshold is not documented, but in my experience if it estimates the number of matching rows are over 20% of the table, it tends to do a table-scan. YMMV

    You can use an index hint to tell MySQL that it should treat a table-scan as infinitely costly, so if the index can be used at all, it should prefer it.

    SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
    FROM events FORCE INDEX (event_time_idx)
    WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000
    

    But keep in mind MySQL's cost-based optimizer might have been right. It might in fact be less costly to do the table-scan, depending on your data.