Search code examples
sqltimestampsnowflake-cloud-data-platformdatediff

Determining how far away is the next ID


So I have some data, subset as follows:

ID  data start_time 
001    X 2021-12-29 10:54:12.429 +0000
002    Y 2022-01-16 05:07:55.708 +0000 
003    Y 2021-12-31 12:25:12.980 +0000
002    A 2022-01-03 12:49:41.866 +0000
001    A 2021-12-30 16:32:13.736 +0000
001    A 2022-01-17 10:10:10.736 +0000

I would like to determine in minutes, the time difference between a given ID and the next occurrence in the dataframe, in order of start_time. So if an ID appears at 12:00 and 12:01, I would like the ID to show the time of the next entry as well as the diff in minutes, using SQL/Snowflake. CTE preferred.

The following fields should be added:

  • next_timestamp: The timestamp of the following entry
  • time_diff: The difference in minutes between start_time and next_timestamp.
  • entry_order: The number of how many of this ID has been.

Expected output:

ID  data start_time                       next_timestamp                 time_diff  entry_order
001    X 2021-12-29 10:54:12.429 +0000    2021-12-30 16:32:13.736 +0000  1778       1
001    A 2021-12-30 16:32:13.736 +0000    2022-01-17 10:10:10.736 +0000  25537      2
003    Y 2021-12-31 12:25:12.980 +0000    NULL                           NULL       1
002    A 2022-01-03 12:49:41.866 +0000    2022-01-16 05:07:55.708 +0000  18258      1
002    Y 2022-01-16 05:07:55.708 +0000    NULL                           NULL       2
001    A 2022-01-17 10:10:10.736 +0000    NULL                           NULL       3

Notice, the resulting output is ordered by the timestamp, ascending.


Solution

  • The LEAD function can be used to find the next start_time per ID.

    And the ROW_NUMBER function can return a unique sequencial number per ID.

    SELECT *
    , LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time) AS next_timestamp
    , DATEDIFF(minute, start_time, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time)) AS time_diff
    , ROW_NUMBER() OVER (PARTITION BY ID ORDER BY start_time) AS entry_order
    FROM your_table
    ORDER BY start_time