I have a dataset where I want to partition it by timestamp close to each other(say less than 30min)
Driver | Timestamp
A | 10/30/2019 05:02:28
A | 10/30/2019 05:05:28
A | 10/30/2019 05:09:28
A | 10/30/2019 05:12:28
A | 10/30/2019 07:54:28
A | 10/30/2019 07:57:28
A | 10/30/2019 08:02:28
A | 10/30/2019 12:14:28
A | 10/30/2019 12:17:28
A | 10/30/2019 12:22:28
How can we partition it like below:
id | Driver | Timestamp
1 | A | 10/30/2019 05:02:28
1 | A | 10/30/2019 05:05:28
1 | A | 10/30/2019 05:09:28
1 | A | 10/30/2019 05:12:28
2 | A | 10/30/2019 07:54:28
2 | A | 10/30/2019 07:57:28
2 | A | 10/30/2019 08:02:28
3 | A | 10/30/2019 12:14:28
3 | A | 10/30/2019 12:17:28
3 | A | 10/30/2019 12:22:28
Any help would be highly appreciated, thank you so much!
It depends on what you exactly want.
If you want to break into a new group when there is a 30+ minutes gap between two consecutive timestamps, you can use lag()
and a cumulative sum()
:
select
sum(case
when timestamp < lag_timestamp + interval '30' minute
then 0
else 1
end
) id,
driver,
timestamp
from (
select
t.*,
lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
from mytable t
) t