So I have some data, subset as follows:
ID data start_time
001 X 2021-12-29 10:54:12.429 +0000
002 Y 2022-01-16 05:07:55.708 +0000
003 Y 2021-12-31 12:25:12.980 +0000
002 A 2022-01-03 12:49:41.866 +0000
001 A 2021-12-30 16:32:13.736 +0000
001 A 2022-01-17 10:10:10.736 +0000
I would like to determine in minutes, the time difference between a given ID
and the next occurrence in the dataframe, in order of start_time
. So if an ID
appears at 12:00 and 12:01, I would like the ID
to show the time of the next entry as well as the diff in minutes, using SQL/Snowflake. CTE preferred.
The following fields should be added:
next_timestamp
: The timestamp of the following entrytime_diff
: The difference in minutes between start_time
and next_timestamp
.entry_order
: The number of how many of this ID
has been.Expected output:
ID data start_time next_timestamp time_diff entry_order
001 X 2021-12-29 10:54:12.429 +0000 2021-12-30 16:32:13.736 +0000 1778 1
001 A 2021-12-30 16:32:13.736 +0000 2022-01-17 10:10:10.736 +0000 25537 2
003 Y 2021-12-31 12:25:12.980 +0000 NULL NULL 1
002 A 2022-01-03 12:49:41.866 +0000 2022-01-16 05:07:55.708 +0000 18258 1
002 Y 2022-01-16 05:07:55.708 +0000 NULL NULL 2
001 A 2022-01-17 10:10:10.736 +0000 NULL NULL 3
Notice, the resulting output is ordered by the timestamp, ascending.
The LEAD function can be used to find the next start_time per ID.
And the ROW_NUMBER function can return a unique sequencial number per ID.
SELECT *
, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time) AS next_timestamp
, DATEDIFF(minute, start_time, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time)) AS time_diff
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY start_time) AS entry_order
FROM your_table
ORDER BY start_time