Search code examples
sqlprestogaps-and-islandstrino

Temporal clustering in SQL


I have a list of timestamps that I'd like to cluster together. For example, given the following timestamps:

2022-01-02T03:04:00
2022-01-02T03:05:00
2023-04-05T06:07:00
2023-04-05T06:08:00
2023-04-05T06:10:00

I would like to return something like:

start                  end
2022-01-02T03:04:00    2022-01-02T03:05:00
2023-04-05T06:07:00    2023-04-05T06:10:00

Is this possible in modern SQL?


Solution

  • You can do that kind of analysis with MATCH_RECOGNIZE. For instance, if you want to group events that are within 1 day of each other in the same cluster, you could do this:

    WITH data(t) AS (
        VALUES
         TIMESTAMP '2022-01-02 03:04:00',
         TIMESTAMP '2022-01-02 03:05:00',
         TIMESTAMP '2023-04-05 06:07:00',
         TIMESTAMP '2023-04-05 06:08:00',
         TIMESTAMP '2023-04-05 06:10:00'
    )
    SELECT *
    FROM data
    MATCH_RECOGNIZE (
        ORDER BY t
        MEASURES
            FIRST(A.t) AS start_time,
            LAST(A.t) AS end_time
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A+)
        DEFINE
            A AS t - FIRST(t) < INTERVAL '1' DAY
    )
    

    You can adjust the t - FIRST(t) < INTERVAL '1' DAY if you want different criteria for grouping the entries.

    See these for more details on how MATCH_RECOGNIZE works: