Search code examples
sqlduplicatessnowflake-cloud-data-platformwindow-functionsgaps-and-islands

Review sessions from timestamp column with start/stop event with duplicate start/stop records


I have following case change records:

id case_id state time_created
1 100 REVIEW_NEEDED 2021-03-30 15:11:58.015907000
2 100 REVIEW_NEEDED 2021-04-01 13:08:17.945926000
3 100 REVIEW 2021-04-07 06:20:48.873865000
4 100 WAITING 2021-04-07 06:32:47.159664000
5 100 REVIEW_NEEDED 2021-04-09 06:32:51.132127000
6 100 REVIEW 2021-04-12 04:39:36.426467000
7 100 REVIEW 2021-04-12 04:40:36.000000000
8 100 CLOSED 2021-04-12 04:40:43.133736000
9 101 REVIEW_NEEDED 2021-03-30 20:37:58.015907000
10 101 REVIEW 2021-04-04 13:08:17.945926000
11 101 CLOSED 2021-04-06 06:20:48.873865000
12 101 CLOSED 2021-04-06 06:20:50.000000000

I'd like to report sessions out of these like following:

open_id close_id case_id waiting_time_start handling_time_start handling_time_end
1 4 100 2021-03-30 15:11:58.015907000 2021-04-07 06:20:48.873865000 2021-04-07 06:32:47.159664000
5 8 100 2021-04-09 06:32:51.132127000 2021-04-12 04:39:36.426467000 2021-04-12 04:40:43.133736000
9 11 101 2021-03-30 20:37:58.015907000 2021-04-04 13:08:17.945926000 2021-04-06 06:20:48.873865000

Waiting_time_start: when state = REVIEW_NEEDED

Handling_time_start: when state = REVIEW

Handling_time_end: when state = WAITING or CLOSED

My current solution is to rank the Waiting_time_start, Handling_time_start and Handling_time_end for each case and then join these events on rank, but this is not perfect as there's duplicate records, so number of start/stop events can differ for a case.

Thanks a lot for any ideas!


Solution

  • This is rather complicated. Start by adding a grouping based on the count of "waiting" and "closed" -- but only when they change values:

    select t.*,
           sum(case when (state <> next_state or next_state is null) and
                         state in ('WAITING', 'CLOSED')
                    then 1 else 0
               end) over (partition by caseid order by time_created desc) as grouping
    from (select t.*,
                 lead(state) over (partition by caseid order by time_created) as next_state
          from t
         ) t
    

    Then, you can just aggregate:

    with cte as (
          select t.*,
                 sum(case when (state <> next_state or next_state is null) and
                               state in ('WAITING', 'CLOSED')
                          then 1 else 0
                     end) over (partition by caseid order by time_created desc) as grouping
          from (select t.*,
                       lead(state) over (partition by caseid order by time_created) as next_state
                from t
               ) t
         )
    select caseid, min(id), max(id),
           min(case when status = 'REVIEW_NEEDED' then time_created end),
           min(case when status = 'REVIEW' then time_created end),
           max(time_created)
    from cte
    group by grouping, caseid;