I have a list of timestamps that I'd like to cluster together. For example, given the following timestamps:
2022-01-02T03:04:00
2022-01-02T03:05:00
2023-04-05T06:07:00
2023-04-05T06:08:00
2023-04-05T06:10:00
I would like to return something like:
start end
2022-01-02T03:04:00 2022-01-02T03:05:00
2023-04-05T06:07:00 2023-04-05T06:10:00
Is this possible in modern SQL?
You can do that kind of analysis with MATCH_RECOGNIZE
. For instance, if you want to group events that are within 1 day of each other in the same cluster, you could do this:
WITH data(t) AS (
VALUES
TIMESTAMP '2022-01-02 03:04:00',
TIMESTAMP '2022-01-02 03:05:00',
TIMESTAMP '2023-04-05 06:07:00',
TIMESTAMP '2023-04-05 06:08:00',
TIMESTAMP '2023-04-05 06:10:00'
)
SELECT *
FROM data
MATCH_RECOGNIZE (
ORDER BY t
MEASURES
FIRST(A.t) AS start_time,
LAST(A.t) AS end_time
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A+)
DEFINE
A AS t - FIRST(t) < INTERVAL '1' DAY
)
You can adjust the t - FIRST(t) < INTERVAL '1' DAY
if you want different criteria for grouping the entries.
See these for more details on how MATCH_RECOGNIZE
works: