Search code examples
mysqlmariadbquery-optimization

Mariadb / mysql EXPLAIN to optimize data query


I am using mariadb to store a large number of measurements. A common query I do selects for a data range. It takes a little long, something like 10 to 20 seconds, and it gets worse and worse, so I was playing around with my indices. I use EXPLAIN to see if the indices work.

But now EXPLAIN gives me a confusing output. It seems that what it does depends on the actual date range I select. For July it does something reasonable, for June not. I see it also in the query times: July is much faster. What could be the cause?

That is the output. I did not change anything else between these to queries. Only the date range. But somehow, 'type' is different, 'key' is different, 'key_len' is different, and most notably, it scans all 16 mio entries of the table.

EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-07-01' AND '2024-07-30';
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
| id   | select_type | table         | type  | possible_keys            | key         | key_len | ref  | rows    | Extra       |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
|    1 | SIMPLE      | eurostat_dump | range | last_update,station_name | last_update | 3075    | NULL | 1972882 | Using where |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
MariaDB [slr_stats]> EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-06-01' AND '2024-06-30';
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
| id   | select_type | table         | type | possible_keys            | key  | key_len | ref  | rows     | Extra       |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
|    1 | SIMPLE      | eurostat_dump | ALL  | last_update,station_name | NULL | NULL    | NULL | 16446501 | Using where |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+

EDIT:

This is the table layout:

| eurostat_dump | CREATE TABLE `eurostat_dump` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `station_name` text DEFAULT NULL,
  `last_update` text DEFAULT NULL,
  `target_name` text DEFAULT NULL,
  `status_code` text DEFAULT NULL,
  `returns` int(11) DEFAULT NULL,
  `prediction` text DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `last_update` (`last_update`(768)),
  KEY `station_name` (`station_name`(768))
) ENGINE=InnoDB AUTO_INCREMENT=17848007 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |

Solution

    • If last_update is a date, use DATE or DATETIME or DATETIME(6). Saves space and avoids prefixing issues.
    • Never use TINYTEXT -- it is worse than the equivalent VARCHAR(255).
    • When you have INDEX(a,b), don't also have INDEX(a).
    • Normalize station_name and target_name to an INT of the smallest practical size.

    Most of the above save space, hence improving speed.