here's an example of the input and the desired output, I think this is easier to grasp than me describing the need.
name | start | end | difference | ongoing_sequence |
---|---|---|---|---|
test 1 | 2024-04-21 00:00:21 | 2024-04-21 00:01:21 | 00:01:00 | 1 |
test 1 | 2024-04-21 00:01:21 | 2024-04-21 00:02:21 | 00:01:00 | 1 |
test 1 | 2024-04-21 00:02:21 | 2024-04-21 00:03:21 | 00:01:00 | 1 |
test 1 | 2024-04-21 00:03:21 | 2024-04-21 00:15:21 | 00:00:00 | 0 |
test 1 | 2024-04-21 00:15:21 | 2024-04-21 00:16:21 | 00:01:00 | 1 |
test 1 | 2024-04-21 00:16:21 | 2024-04-21 00:17:21 | 00:01:00 | 1 |
test 1 | 2024-04-21 00:17:21 | 2024-04-21 00:23:21 | 00:00:00 | 0 |
test 2 | 2024-04-22 00:00:21 | 2024-04-22 00:01:21 | 00:01:00 | 1 |
test 2 | 2024-04-22 00:01:21 | 2024-04-22 00:02:21 | 00:01:00 | 1 |
name | min_start | max_end | difference |
---|---|---|---|
test 1 | 2024-04-21 00:00:21 | 2024-04-21 00:03:21 | 00:03:00 |
test 1 | 2024-04-21 00:15:21 | 2024-04-21 00:17:21 | 00:02:00 |
test 2 | 2024-04-22 00:00:21 | 2024-04-22 00:02:21 | 00:02:00 |
Could you please give me some advice on how to achieve it in a performant way directly in SQL? Iterating the results and creating the output in that way is not an option.
Thanks in advance!
Similar posts exist, like group by time period but this does not lead to the expected results.
You can use a rolling (stepping, tumbling) count()over window1
. Demo at db<>fiddle:
select name
,min("start") as max_start
,max("end") as max_end
,sum(difference) as difference
from(select *,count(*)filter(where ongoing_sequence=0)over(w1) as seq_no
from test
window w1 as (partition by name order by "start"))_
where ongoing_sequence<>0
group by name,seq_no
order by name;
name | max_start | max_end | difference |
---|---|---|---|
test 1 | 2024-04-21 00:00:21 | 2024-04-21 00:03:21 | 00:03:00 |
test 1 | 2024-04-21 00:15:21 | 2024-04-21 00:17:21 | 00:02:00 |
test 2 | 2024-04-22 00:00:21 | 2024-04-22 00:02:21 | 00:02:00 |
The order by "start"
implies range between unbounded preceding and current row
which means each row looks back and counts how many gaps there were so far. Each uninterrupted sequence of rows will share that count - if there was a gap, it'd get incremented.
In the outer query you just ignore the gap rows and add that shared sequence number to the group by "name"
.
The count()filter(where X)
means sum(case when X then 1 else 0 end)
or count(case when X then 1 end)
but it's a bit nicer.
It's called a gaps-and-islands problem.