Search code examples
mysqlmariadbsql-order-byquery-optimizationwhere-clause

MySQL: index for query with range and order by


I'm trying to get a query containing a range and ORDER BY ... DESC to use an index.
The index is used if I remove the ORDER BY and just use the range.
The index is also used if I remove the range and just use the ORDER BY.

Table:

CREATE TABLE `messages` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `service` varchar(260) COLLATE utf8mb4_unicode_ci NOT NULL,
  `time` datetime NOT NULL,
  `country` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `country_code` char(2) COLLATE utf8mb4_unicode_ci NOT NULL,
  `issue` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `latitude` double DEFAULT NULL,
  `longitude` double DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `service_time_idx` (`service`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Query with range and ORDER BY (Not using index):

MariaDB [m]> explain select service, time, country, city, country_code,
 issue, latitude, longitude
   from messages
   where service = 'myservice'
     and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW()
   order by time desc;
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-------------+
| id   | select_type | table                     | type  | possible_keys    | key              | key_len | ref  | rows | Extra       |
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-------------+
|    1 | SIMPLE      | messages                  | range | service_time_idx | service_time_idx | 1047    | NULL |    1 | Using where |
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-------------+

Query just using range (using index).

MariaDB [m]> explain select service, time, country, city, country_code,
 issue, latitude, longitude
   from messages
   where service = 'myservice'
     and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW();
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-----------------------+
| id   | select_type | table                     | type  | possible_keys    | key              | key_len | ref  | rows | Extra                 |
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-----------------------+
|    1 | SIMPLE      | messages                  | range | service_time_idx | service_time_idx | 1047    | NULL |    1 | Using index condition |
+------+-------------+---------------------------+-------+------------------+------------------+---------+------+------+-----------------------+
1 row in set (0.004 sec)

Query just using ORDER BY (using index):


MariaDB [m]> explain select service, time, country, city, country_code, issue, latitude, longitude
   from messages
   where service = 'myservice'
     and time = '2020-10-03 09:51:25'
   order by time desc;
+------+-------------+---------------------------+------+------------------+------------------+---------+-------------+------+-----------------------+
| id   | select_type | table                     | type | possible_keys    | key              | key_len | ref         | rows | Extra                 |
+------+-------------+---------------------------+------+------------------+------------------+---------+-------------+------+-----------------------+
|    1 | SIMPLE      | messages                  | ref  | service_time_idx | service_time_idx | 1047    | const,const |    1 | Using index condition |
+------+-------------+---------------------------+------+------------------+------------------+---------+-------------+------+-----------------------+
1 row in set, 1 warning (0.001 sec)

Oddly, I can get the query to work with both the range and ORDER BYif I only select on the service and time columns:

MariaDB [m]> explain select service, time from messages
   where service = 'myservice'
     and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW()
   order by time desc;
+------+-------------+---------------------------+------+------------------+------------------+---------+-------+------+--------------------------+
| id   | select_type | table                     | type | possible_keys    | key              | key_len | ref   | rows | Extra                    |
+------+-------------+---------------------------+------+------------------+------------------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | messages                  | ref  | service_time_idx | service_time_idx | 1042    | const |    1 | Using where; Using index |
+------+-------------+---------------------------+------+------------------+------------------+---------+-------+------+--------------------------+
1 row in set (0.002 sec)
MariaDB [m]> EXPLAIN FORMAT=JSON select service, time, country, city, country_code, 
issue, latitude, longitude 
  from messages 
where service = 'myservice' 
  and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW() 
order by time desc;

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "messages",
      "access_type": "range",
      "possible_keys": ["service_time_idx"],
      "key": "service_time_idx",
      "key_length": "1047",
      "used_key_parts": ["service", "time"],
      "rows": 1,
      "filtered": 100,
      "attached_condition": "messages.service = 'myservice' and messages.`time` between <cache>(current_timestamp() - interval 24 hour) and <cache>(current_timestamp())"
    }
  }
} 

How can I get the first query above to use the index?
Obviously I can write the query a different way if I need to.


Solution

  • Your INDEX(service, time) is perfect. EXPLAIN is somewhat cryptic; Let me elaborate;

    (But first, time is a DATETIME, so time = '12:10' does not make sense.)

    So that we can see them all at once:

    1  where service = 'myservice'
         and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW()
       order by time desc;
    
    2  where service = 'myservice'
         and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW();
    
    3  where service = 'myservice'
         and time = '2020-10-03 09:51:25'
       order by time desc;
    
    4  where service = 'myservice'
         and time BETWEEN DATE_SUB( NOW() , INTERVAL 24 HOUR ) AND NOW()
       order by time desc;
    

    What happens when you change #3 to test against a datetime?

    All of them have

       where service = 'myservice'
    

    so it is good to have INDEX(service, ...)

    All of them then have a reference, either via a "range" or "=", to time, so that is a good column to have next: INDEX(service, time)

    Now, let's see if the ORDER BY (where present) can take advantage of the INDEX(service, time).

    • The entire WHERE has been handled, so it is possible to move onto GROUP BY (not present in your case) and ORDER BY.
    • EXPLAIN give no clue of whether it is used or not. (Or at least not a definitive clue)
    • EXPLAIN FORMAT=JSON will give a strong clue. Run that.
    • I will say that in all of your ORDER BY cases, the index will be used.
    • Here's a technique for determining it by counting the rows touched: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts

    Perhaps this is confusing things:

    • Extra = Using index says that the index is "covering" which means that all the columns anywhere in the SELECT are included in the INDEX(...).
    • Extra = Using index condition refers to "ICP", which is mostly irrelevant to this discussion.
    • Neither of the above in Extra -- this does not say anything about whether some INDEX is being used.
    • Type=All (which you don't have) means that no INDEX is used. Note: The optimizer sometimes shuns all indexes and chooses to read "All" rows.
    • Type other than All means that an index was at least partially used.