I have a database with the following columns. I have added some sample data to show formatting.
date, time, amount
2021-10-14, 13:00, 15.40
2021-10-14, 13:01, 9.34
2021-10-14, 13:02, 10.12
2021-10-14, 13:03, 7.44
There are 2.6 million rows in the database spanning two years. Each row is an increment of 1 minute.
I need to write sql that will output and group rows that are continuous by minute for the same date, where the amount is greater than 8.00 and there are a minimum of 3 consecutive rows.
This would then find an example like:
2021-11-30, 14:44, 8.04
2021-11-30, 14:45, 9.41
2021-11-30, 14:46, 9.27
2021-11-30, 14:47, 10.54
2021-11-30, 14:48, 11.09
2022-03-13, 08:22, 36.44
2022-03-13, 08:23, 17.38
2022-03-13, 08:24, 11.86
So if I understand correctly you only want to select the rows that are part of a 3 minute (minimum) consecutive sequence where amount >= 8 ?
I'm not sure about the performance but this seems to work:
Setup:
CREATE TABLE series
(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
date DATE NOT NULL,
time TIME NOT NULL,
datetime DATETIME GENERATED ALWAYS AS (TIMESTAMP(date, time)),
amount decimal(5, 2),
INDEX (amount)
);
INSERT INTO series (date, time, amount)
VALUES ('2021-11-30', '14:40', 7),
('2021-11-30', '14:41', 8),
('2021-11-30', '14:42', 8),
('2021-11-30', '14:43', 8),
('2021-11-30', '14:44', 8),
('2021-11-30', '14:45', 7),
('2021-11-30', '14:46', 8),
('2021-11-30', '14:47', 8),
('2021-11-30', '14:48', 8),
('2021-11-30', '14:49', 7),
('2021-11-30', '14:50', 8),
('2021-11-30', '14:51', 8),
('2021-11-30', '14:52', 7)
;
The solution:
SELECT date, time, T.amount
FROM (SELECT date,
time,
datetime,
amount,
LAG(datetime, 2) OVER (order by datetime) AS tmin2,
LAG(datetime, 1) OVER (order by datetime) AS tmin1,
LEAD(datetime, 1) OVER (order by datetime) AS tplus1,
LEAD(datetime, 2) OVER (order by datetime) AS tplus2
FROM series
WHERE amount >= 8) T
WHERE TIME_TO_SEC(TIMEDIFF(T.datetime, T.tmin2)) = 120
OR TIME_TO_SEC(TIMEDIFF(T.datetime, T.tplus2)) = -120
OR (TIME_TO_SEC(TIMEDIFF(T.datetime, T.tmin1)) = 60 AND TIME_TO_SEC(TIMEDIFF(T.datetime, T.tplus1)) = -60)
ORDER BY datetime;
Explanation:
WHERE
-statement.datetime
to see if the current to is part of a 3 min sequence and filter based on that criteria.