I have a table with several million rows of data. I have a primary key on id, and a compound unique key on col2, col3, col4 and my_date. Sample data is shown here...
id col2 col3 col4 my_date
1 1 1 1 2020-01-03 02:00:00
2 1 2 1 2020-01-03 01:00:00
3 1 3 1 2020-01-03 03:00:00
4 2 1 1 2020-02-03 01:00:00
5 2 2 1 2020-02-03 02:00:00
6 2 3 1 2020-02-03 03:00:00
7 3 1 1 2020-03-03 03:00:00
8 3 2 1 2020-03-03 02:00:00
9 3 3 1 2020-03-03 01:00:00
I want to get the min and max dates, grouped by col2.
The rules are...
My idea was to get the min date from the first id and add 1 week to it. Then use this data as a range restricter to get the minimum date for each col2 group, and the opposite for the max dates.
I have created the following query...
SELECT t1.col2, t1.min_date, t2.max_date
FROM (
SELECT min(table.my_date) AS min_date, table.col2 AS col2
FROM table
JOIN (
SELECT table.id AS id, date_add(table.my_date, interval 7 day) AS min_date
FROM table
JOIN (
SELECT min(table.id) AS min_id
FROM table
WHERE table.col4 = 1
) AS t3
ON table.id = t3.min_id
) AS t3
ON table.my_date < t3.min_date
GROUP BY table.col2
) AS t1
JOIN (
SELECT max(table.my_date) AS max_date, table.col2 AS col2
FROM table
JOIN (
SELECT table.id AS id, date_sub(table.my_date, interval 7 day) AS max_date
FROM table
JOIN (
SELECT max(table.id) AS max_id
FROM table
WHERE table.col4 = 1
) AS t5
ON table.id = t5.max_id
) AS t4
ON table.my_date > t4.max_date
GROUP BY table.col2
) AS t2
ON t1.col2 = t2.col2
The query runs OK and returns the correct results. It is currently taking about 12 seconds to run.
Is there a better way to write this query to improve the performance/readability?
"I want to get the min and max dates, grouped by col2." -- Then simply have
INDEX(col2, my_date)
You currently have UNIQUE(col2, col3, col4, my_date)
, correct? If you are willing to shuffle that, you won't need my extra index: UNIQUE(col2, my_date, col3, col4)
.
Then, this is quite efficient:
SELECT col2, MIN(my_date), MAX(my_date)
FROM table
GROUP BY col2;
EXPLAIN SELECT ...
will show that the index is "covering" by saying "Using index".