Search code examples
snowflake-cloud-data-platformwindow-functions

How to calculate windows in snowflake dataset?


Database is Snowflake.
I have a dataset of cases created by customers and there are agents assigned to these cases. how_to_get_agent_72hr column is what I am trying to figure out. I have added this column to show expected output. The requirement is that any case created within the 72 hours should be assigned to agent from the previous case. Important note: Notice that case 15 is within 72 hours case 14 but it's not assigned to Alejandro because window for Alejandro started at 2024-04-05 17:47:00 and would end at 2024-04-08 17:47:00. So 72 hours is calculate from the time the window started.

customer_id case_id case_dt agent how_to_get_agent_72hr Notes
1 case_1 2020-11-10 16:52:22 Sean Sean
1 case_2 2020-11-25 17:50:55 Alico Alico
1 case_3 2022-07-27 17:54:34 Katherine Katherine
1 case_4 2022-10-05 11:36:43 Victor Victor
1 case_5 2022-10-18 23:11:03 Automated Automated window1 starts here
1 case_6 2022-10-19 8:44:58 Denzel Automated
1 case_7 2022-10-20 17:13:00 Salesforce Automated
1 case_8 2022-10-20 17:15:44 Salesforce Automated ends
1 case_9 2023-02-24 12:32:16 Aryll Aryll
1 case_10 2023-11-02 17:29:04 Kristine Kristine
1 case_11 2023-12-23 16:34:00 Katherine Katherine
1 case_12 2024-04-05 17:47:00 Alejandro Alejandro window2 starts here
1 case_13 2024-04-06 21:49:11 Angel Alejandro
1 case_14 2024-04-08 8:16:51 Wilbert Alejandro ends
1 case_15 2024-04-09 16:34:27 Ezekiel Ezekiel window3 starts here
1 case_16 2024-04-09 17:16:27 Renz Ezekiel
1 case_17 2024-04-10 17:59:06 Raymond Ezekiel
1 case_18 2024-04-11 22:35:56 Glen Ezekiel end
1 case_19 2024-04-15 13:37:32 Rashid Rashid

Solution

  • Tabular JavaScript UDFs (UDTFs)

    create or replace function carry_func(epoch float, name string) 
    returns table(carry_name string)
    language javascript as
    '{
        initialize: function (argumentInfo, context) {
            this.last_epoch = 0.0;
            this.last_name = "";
        },
        processRow: function f(row, rowWriter, context){
            var diff = row.EPOCH - this.last_epoch;
            if(diff > 259200.0 /*72*60*60*/) {
                this.last_epoch = row.EPOCH;
                this.last_name = row.NAME;
            }
            rowWriter.writeRow({CARRY_NAME: this.last_name});
        }
    }';
    

    over your data:

    with data(customer_id, case_id, case_dt, agent, how_to_get_agent_72hr, Notes) as (
        select * from values
      (1, 'case_1' , '2020-11-10 16:52:22'::timestamp, 'Sean', 'Sean', ''),
      (1, 'case_2' , '2020-11-25 17:50:55'::timestamp, 'Alico', 'Alico', ''),
      (1, 'case_3' , '2022-07-27 17:54:34'::timestamp, 'Katherine', 'Katherine', ''),
      (1, 'case_4' , '2022-10-05 11:36:43'::timestamp, 'Victor', 'Victor', ''),
      (1, 'case_5' , '2022-10-18 23:11:03'::timestamp, 'Automated', 'Automated', 'window1 starts here'),
      (1, 'case_6' , '2022-10-19 08:44:58'::timestamp, 'Denzel', 'Automated', ''),
      (1, 'case_7' , '2022-10-20 17:13:00'::timestamp, 'Salesforce', 'Automated', ''),
      (1, 'case_8' , '2022-10-20 17:15:44'::timestamp, 'Salesforce', 'Automated', 'ends'),
      (1, 'case_9' , '2023-02-24 12:32:16'::timestamp, 'Aryll', 'Aryll', ''),
      (1, 'case_10', '2023-11-02 17:29:04'::timestamp, 'Kristine', 'Kristine', ''),
      (1, 'case_11', '2023-12-23 16:34:00'::timestamp, 'Katherine', 'Katherine', ''),
      (1, 'case_12', '2024-04-05 17:47:00'::timestamp, 'Alejandro', 'Alejandro', 'window2 starts here'),
      (1, 'case_13', '2024-04-06 21:49:11'::timestamp, 'Angel', 'Alejandro', ''),
      (1, 'case_14', '2024-04-08 08:16:51'::timestamp, 'Wilbert', 'Alejandro', 'ends'),
      (1, 'case_15', '2024-04-09 16:34:27'::timestamp, 'Ezekiel', 'Ezekiel', 'window3 starts here'),
      (1, 'case_16', '2024-04-09 17:16:27'::timestamp, 'Renz', 'Ezekiel', ''),
      (1, 'case_17', '2024-04-10 17:59:06'::timestamp, 'Raymond', 'Ezekiel', ''),
      (1, 'case_18', '2024-04-11 22:35:56'::timestamp, 'Glen', 'Ezekiel', 'end'),
      (1, 'case_19', '2024-04-15 13:37:32'::timestamp, 'Rashid', 'Rashid', '')
    )
    select *
    from data as d,
     table(carry_func(DATE_PART(epoch_second,d.case_dt)::float, d.agent) 
          over (partition by d.customer_id order by d.case_dt))
    

    make sure to use the OVER clause to partition and order those rows, or your results are random..

    CUSTOMER_ID CASE_ID CASE_DT AGENT HOW_TO_GET_AGENT_72HR NOTES CARRY_NAME
    1 case_1 2020-11-10 16:52:22.000 Sean Sean Sean
    1 case_2 2020-11-25 17:50:55.000 Alico Alico Alico
    1 case_3 2022-07-27 17:54:34.000 Katherine Katherine Katherine
    1 case_4 2022-10-05 11:36:43.000 Victor Victor Victor
    1 case_5 2022-10-18 23:11:03.000 Automated Automated window1 starts here Automated
    1 case_6 2022-10-19 08:44:58.000 Denzel Automated Automated
    1 case_7 2022-10-20 17:13:00.000 Salesforce Automated Automated
    1 case_8 2022-10-20 17:15:44.000 Salesforce Automated ends Automated
    1 case_9 2023-02-24 12:32:16.000 Aryll Aryll Aryll
    1 case_10 2023-11-02 17:29:04.000 Kristine Kristine Kristine
    1 case_11 2023-12-23 16:34:00.000 Katherine Katherine Katherine
    1 case_12 2024-04-05 17:47:00.000 Alejandro Alejandro window2 starts here Alejandro
    1 case_13 2024-04-06 21:49:11.000 Angel Alejandro Alejandro
    1 case_14 2024-04-08 08:16:51.000 Wilbert Alejandro ends Alejandro
    1 case_15 2024-04-09 16:34:27.000 Ezekiel Ezekiel window3 starts here Ezekiel
    1 case_16 2024-04-09 17:16:27.000 Renz Ezekiel Ezekiel
    1 case_17 2024-04-10 17:59:06.000 Raymond Ezekiel Ezekiel
    1 case_18 2024-04-11 22:35:56.000 Glen Ezekiel end Ezekiel
    1 case_19 2024-04-15 13:37:32.000 Rashid Rashid Rashid