Search code examples
mysqlmaxgroupingmin

MySql query tuning for min/max dates by group


I have a table with several million rows of data. I have a primary key on id, and a compound unique key on col2, col3, col4 and my_date. Sample data is shown here...

id   col2 col3 col4 my_date
1    1    1    1    2020-01-03 02:00:00
2    1    2    1    2020-01-03 01:00:00
3    1    3    1    2020-01-03 03:00:00
4    2    1    1    2020-02-03 01:00:00
5    2    2    1    2020-02-03 02:00:00
6    2    3    1    2020-02-03 03:00:00
7    3    1    1    2020-03-03 03:00:00
8    3    2    1    2020-03-03 02:00:00
9    3    3    1    2020-03-03 01:00:00

I want to get the min and max dates, grouped by col2.

The rules are...

  1. For each col4 there will be multiple col2's
  2. For each col2 there will be multiple col3's
  3. The dates associated with col2 can be in any order
  4. The min/max dates for each group will all be within 7 days of each other.

My idea was to get the min date from the first id and add 1 week to it. Then use this data as a range restricter to get the minimum date for each col2 group, and the opposite for the max dates.

I have created the following query...

SELECT t1.col2, t1.min_date, t2.max_date
FROM (
    SELECT min(table.my_date) AS min_date, table.col2 AS col2
    FROM table
    JOIN (
        SELECT table.id AS id, date_add(table.my_date, interval 7 day) AS min_date
        FROM table
        JOIN (
            SELECT min(table.id) AS min_id
            FROM table
            WHERE table.col4 = 1
        ) AS t3
        ON table.id = t3.min_id
    ) AS t3
    ON table.my_date < t3.min_date
    GROUP BY table.col2
) AS t1
JOIN (
    SELECT max(table.my_date) AS max_date, table.col2 AS col2
    FROM table
    JOIN (
        SELECT table.id AS id, date_sub(table.my_date, interval 7 day) AS max_date
        FROM table
        JOIN (
            SELECT max(table.id) AS max_id
            FROM table
            WHERE table.col4 = 1
        ) AS t5
        ON table.id = t5.max_id
    ) AS t4
    ON table.my_date > t4.max_date
    GROUP BY table.col2
) AS t2
ON t1.col2 = t2.col2

The query runs OK and returns the correct results. It is currently taking about 12 seconds to run.

Is there a better way to write this query to improve the performance/readability?


Solution

  • "I want to get the min and max dates, grouped by col2." -- Then simply have

    INDEX(col2, my_date)
    

    You currently have UNIQUE(col2, col3, col4, my_date), correct? If you are willing to shuffle that, you won't need my extra index: UNIQUE(col2, my_date, col3, col4).

    Then, this is quite efficient:

    SELECT col2, MIN(my_date), MAX(my_date)
        FROM table
        GROUP BY col2;
    

    EXPLAIN SELECT ... will show that the index is "covering" by saying "Using index".