sql duplicates snowflake-cloud-data-platform window-functions gaps-and-islands

Review sessions from timestamp column with start/stop event with duplicate start/stop records

I have following case change records:

id	case_id	state	time_created
1	100	REVIEW_NEEDED	2021-03-30 15:11:58.015907000
2	100	REVIEW_NEEDED	2021-04-01 13:08:17.945926000
3	100	REVIEW	2021-04-07 06:20:48.873865000
4	100	WAITING	2021-04-07 06:32:47.159664000
5	100	REVIEW_NEEDED	2021-04-09 06:32:51.132127000
6	100	REVIEW	2021-04-12 04:39:36.426467000
7	100	REVIEW	2021-04-12 04:40:36.000000000
8	100	CLOSED	2021-04-12 04:40:43.133736000
9	101	REVIEW_NEEDED	2021-03-30 20:37:58.015907000
10	101	REVIEW	2021-04-04 13:08:17.945926000
11	101	CLOSED	2021-04-06 06:20:48.873865000
12	101	CLOSED	2021-04-06 06:20:50.000000000

I'd like to report sessions out of these like following:

open_id	close_id	case_id	waiting_time_start	handling_time_start	handling_time_end
1	4	100	2021-03-30 15:11:58.015907000	2021-04-07 06:20:48.873865000	2021-04-07 06:32:47.159664000
5	8	100	2021-04-09 06:32:51.132127000	2021-04-12 04:39:36.426467000	2021-04-12 04:40:43.133736000
9	11	101	2021-03-30 20:37:58.015907000	2021-04-04 13:08:17.945926000	2021-04-06 06:20:48.873865000

Waiting_time_start: when state = REVIEW_NEEDED

Handling_time_start: when state = REVIEW

Handling_time_end: when state = WAITING or CLOSED

My current solution is to rank the Waiting_time_start, Handling_time_start and Handling_time_end for each case and then join these events on rank, but this is not perfect as there's duplicate records, so number of start/stop events can differ for a case.

Thanks a lot for any ideas!

Solution

This is rather complicated. Start by adding a grouping based on the count of "waiting" and "closed" -- but only when they change values:

select t.*,
       sum(case when (state <> next_state or next_state is null) and
                     state in ('WAITING', 'CLOSED')
                then 1 else 0
           end) over (partition by caseid order by time_created desc) as grouping
from (select t.*,
             lead(state) over (partition by caseid order by time_created) as next_state
      from t
     ) t

Then, you can just aggregate:

with cte as (
      select t.*,
             sum(case when (state <> next_state or next_state is null) and
                           state in ('WAITING', 'CLOSED')
                      then 1 else 0
                 end) over (partition by caseid order by time_created desc) as grouping
      from (select t.*,
                   lead(state) over (partition by caseid order by time_created) as next_state
            from t
           ) t
     )
select caseid, min(id), max(id),
       min(case when status = 'REVIEW_NEEDED' then time_created end),
       min(case when status = 'REVIEW' then time_created end),
       max(time_created)
from cte
group by grouping, caseid;