today I ran into a performance issue with a MariaDB query for this scenario: I have a table called stations where gas stations are saved (around 17k entries) with post_code, uuid, name, and other info that aren't used in this case. Then there's another table called prices (~210Mio entries) where a single entry represents a single price change for a single station with its specific date time and station_uuid.
First I needed to retrieve the min price for diesel in a 30-minute interval of a single day. This works fine. But then I needed to add the corresponding stations' names of the station the price change happened. So I added the specific join Logic but then remembered that it's not guaranteed to use the correct station name when grouping by time. So I tried to use a subquery but that's resulting in a timeout error.
I'm using single-column BTREE indexes on prices.diesel, prices.date, stations.post_code.
Here are my so far tried SQL Queries:
SELECT
MIN(diesel) AS "Minimal",
date - interval minute(date)%30 minute AS time
FROM prices
LEFT JOIN stations
ON prices.station_uuid = stations.uuid
WHERE `date` BETWEEN (SELECT FROM_UNIXTIME(1684627200)) AND (SELECT FROM_UNIXTIME(1684713599))
AND diesel > 0.0
AND post_code = 59929
GROUP BY date_format(time, '%Y-%m-%d %H:%i');
SELECT
MIN(diesel) AS "Minimal",
name AS "Tankstelle"
date - interval minute(date)%30 minute AS time
FROM prices
LEFT JOIN stations
ON prices.station_uuid = stations.uuid
WHERE `date` BETWEEN (SELECT FROM_UNIXTIME(1684627200)) AND (SELECT FROM_UNIXTIME(1684713599))
AND diesel > 0.0
AND post_code = 33106
GROUP BY date_format(time, '%Y-%m-%d %H:%i');
SELECT
stations.name AS "Tankstelle",
date - interval minute(date)%30 minute as time
FROM prices
LEFT JOIN stations
ON prices.station_uuid = stations.uuid
WHERE `date` BETWEEN (SELECT FROM_UNIXTIME(1684627200)) AND (SELECT FROM_UNIXTIME(1684713599))
AND diesel > 0.0
AND post_code = 59929
AND diesel = (
SELECT MIN(diesel) as Min
FROM prices p
WHERE p.date BETWEEN (prices.date - interval minute(prices.date)%30 minute) AND (prices.date - interval ((minute(prices.date)%30)+30) minute))
EDIT to answer first three comments (23.05.2023):
MariaDB version: 10.11.3-MariaDB-1:10.11.3+maria~ubu2204
"diesel" is the price of diesel but if it is a change for e10 then the diesel will be 0 so to avoid using those rows I filter for 0 values
May sound a bit ignorant, but I don't know exactly how "with ties" works, so I don't have an opinion yet. And the clustered key is the primary key, isn't it? If yes it should be answered with the table definitions. If not then it sounds even more ignorant since you would need to explain to me how it differs from the primary key.
Thanks for the help in advance.
The result of EXPLAIN is the following:
Query 1:
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
| 1 | SIMPLE | prices | range | date,diesel | date | 5 | NULL | 695600 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | stations | eq_ref | PRIMARY,post_code | PRIMARY | 16 | tankonix.prices.station_uuid | 1 | Using where |
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
Query 2:
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
| 1 | SIMPLE | prices | range | date,diesel | date | 5 | NULL | 695600 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | stations | eq_ref | PRIMARY,post_code | PRIMARY | 16 | tankonix.prices.station_uuid | 1 | Using where |
+------+-------------+----------+--------+-------------------+---------+---------+------------------------------+--------+---------------------------------------------------------------------+
Query 3: explain result
+------+--------------------+----------+--------+-------------------+---------+---------+------------------------------+-----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+----------+--------+-------------------+---------+---------+------------------------------+-----------+------------------------------------+
| 1 | PRIMARY | prices | range | date,diesel | date | 5 | NULL | 695600 | Using index condition; Using where |
| 1 | PRIMARY | stations | eq_ref | PRIMARY,post_code | PRIMARY | 16 | tankonix.prices.station_uuid | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | p | ALL | date | NULL | NULL | NULL | 207323157 | Using where |
+------+--------------------+----------+--------+-------------------+---------+---------+------------------------------+-----------+------------------------------------+
My Table Definitions:
prices: table prices indexes prices
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table
|
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| prices | CREATE TABLE `prices` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`station_uuid` text NOT NULL,
`diesel` float NOT NULL,
`e10` float NOT NULL,
`e5` float NOT NULL,
`dieselchange` tinyint(4) NOT NULL,
`e5change` tinyint(4) NOT NULL,
`e10change` tinyint(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`) USING HASH,
KEY `e5` (`e5`),
KEY `e10` (`e10`),
KEY `diesel` (`diesel`)
) ENGINE=InnoDB AUTO_INCREMENT=381367318 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
stations: table stations indexes stations
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table
|
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations | CREATE TABLE `stations` (
`uuid` uuid NOT NULL,
`name` varchar(127) NOT NULL,
`brand` varchar(127) DEFAULT NULL,
`street` varchar(127) NOT NULL,
`house_number` varchar(7) NOT NULL DEFAULT '',
`post_code` varchar(5) NOT NULL,
`city` varchar(31) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`openingtimes_json` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL CHECK (json_valid(`openingtimes_json`)),
`first_active` date NOT NULL,
PRIMARY KEY (`uuid`),
KEY `post_code` (`post_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EDIT (09.06.2023):
I succeeded in Optimizing it enough. First I completed the steps mentioned in the marked answer, then it went much better but not fast enough so I proceeded to try different approaches and succeeded with:
SELECT
stations.name AS "Tankstelle",
date - interval minute(date)%30 minute as time_begin,
date - interval minute(date)%30 - 30 minute as time_end
FROM stations
JOIN prices
ON prices.station_uuid = stations.uuid
JOIN (SELECT MIN(diesel) as price,
date_format(date - interval minute(date)%30 minute, "%Y-%m-%d %H:%i:59") as time
FROM prices p
JOIN stations
ON stations.uuid = p.station_uuid
WHERE p.date BETWEEN FROM_UNIXTIME(1684627200) AND FROM_UNIXTIME(1684713599)
AND diesel > 0
AND stations.post_code = 59929
GROUP BY date_format(time, '%Y-%m-%d %H:%i')) as min
ON min.time Between (date - interval minute(date)%30 minute) and (date - interval minute(date)%30 - 30 minute)
WHERE `date` BETWEEN FROM_UNIXTIME(1684627200) AND FROM_UNIXTIME(1684713599)
AND diesel > 0.0
AND post_code = 59929;
like this it gives me the name of the station corresponding to the cheapest diesel price in the given city grouped for each half an hour.
BETWEEN ( SELECT FROM_UNIXTIME(1684627200) ) AND ...
-->
BETWEEN FROM_UNIXTIME(1684627200) AND ...
That is, simply use the expression; you don't need to SELECT
it.
Since the WHERE
clause tests columns from different tables, the Optimizer will pick one of the tables (p
vs s
) to start with. It may goof.
Since there is a test on s.PostCode
in the WHERE
, the LEFT JOIN
is really a JOIN
. Please change that to help humans read the query. Being a JOIN opens the possibility of starting with s
instead of being forced to start with p
.
FROM prices AS p
JOIN stations AS s ON s.uuid = p.station_uuid
WHERE p.date BETWEEN FROM_UNIXTIME(1684627200)
AND FROM_UNIXTIME(1684713599)
AND p.diesel > 0.0
AND s.post_code = 1
Starting with s:
s: (you effectively have this)
INDEX(post_code, -- because of =
uuid) -- for "covering"
p: (you have this)
INDEX(station_uuid, -- =
date, -- assuming it is more selective than diesel
diesel) -- covering
Starting with p:
p: INDEX(date, diesel, station_uuid) -- covering, but otherwise unexciting
s: the PRIMARY KEY(uuid) will be used effectively.
Recommendation (after changing to JOIN
): the second p
index.
Notes:
s
and p
depending on crude statistics it has.PRIMARY KEY
(so it can get to the rest of the rows). Hence (post_code, uuid)
is the same as (post_code)
, which you already have.OVER(PARTITION BY...)
has any magical performance, so I worry that that will be the slowest part of the query.