Search code examples
sqlwindow-functions

Calculating ended_at column based on event_time and is_active columns


Here is an example table containing events for a single split test object in the database. The split test can be turned on and off, as tracked in is_active.

EVENT_ID SPLIT_TEST_ID BRANCH IS_ACTIVE EVENT_TIME
1 595aa50 a TRUE 2021-11-11 22:53:08.360
2 595aa50 b TRUE 2021-11-11 22:53:08.360
3 595aa50 a FALSE 2021-11-11 22:34:39.235
4 595aa50 b FALSE 2021-11-11 22:34:39.235
5 595aa50 a TRUE 2021-11-02 23:40:27.001
6 595aa50 b TRUE 2021-11-02 23:40:27.001
7 595aa50 a FALSE 2021-11-02 20:54:29.620
8 595aa50 b FALSE 2021-11-02 20:54:29.620
9 595aa50 a TRUE 2021-11-02 20:31:08.297
10 595aa50 b TRUE 2021-11-02 20:31:08.297
11 595aa50 c FALSE 2021-10-05 20:33:36.394
12 595aa50 b FALSE 2021-10-05 20:33:36.394
13 595aa50 c TRUE 2021-09-15 21:33:58.856
14 595aa50 b TRUE 2021-09-15 21:33:58.856
15 595aa50 c FALSE 2021-09-08 18:42:35.728
16 595aa50 b FALSE 2021-09-08 18:42:35.728
17 595aa50 c TRUE 2021-09-01 23:09:15.596
18 595aa50 b TRUE 2021-09-01 23:09:15.596

I'm trying to transform the table so that we get a single row for each branch every time the test is run, with started_at and ended_at values instead of event_time. If the test hasn't yet been turned off, the value of ended_at should be null.

This is the result I'm looking for:

SPLIT_TEST_ID BRANCH STARTED_AT ENDED_AT
595aa50 a 2021-11-11 22:53:08.360 null
595aa50 b 2021-11-11 22:53:08.360 null
595aa50 a 2021-11-02 23:40:27.001 2021-11-11 22:34:39.235
595aa50 b 2021-11-02 23:40:27.001 2021-11-11 22:34:39.235
595aa50 a 2021-11-02 20:31:08.297 2021-11-02 20:54:29.620
595aa50 b 2021-11-02 20:31:08.297 2021-11-02 20:54:29.620
595aa50 c 2021-09-15 21:33:58.856 2021-10-05 20:33:36.394
595aa50 b 2021-09-15 21:33:58.856 2021-10-05 20:33:36.394
595aa50 c 2021-09-01 23:09:15.596 2021-09-08 18:42:35.728
595aa50 b 2021-09-01 23:09:15.596 2021-09-08 18:42:35.728

I'm struggling with pulling ended_at based on is_active. I've tried things like

select 
    split_test_id, 
    branch, 
    event_time as started_at,
    last_value(started_at) over (partition by split_test_id, branch, is_active order by event_time) as ended_at
from example
where is_active = true

but I know they're logically flawed.

SQL to create the example table above:

drop table if exists example;

create table example (
    event_id varchar, -- unique
    split_test_id varchar, 
    branch varchar,
    is_active boolean,
    started_at timestamp_ntz
);
 
insert into example values
    ('1', '595aa50', 'a', true, '2021-11-11 22:53:08.360'),
    ('2', '595aa50', 'b', true, '2021-11-11 22:53:08.360'),
    ('3', '595aa50', 'a', false, '2021-11-11 22:34:39.235'),
    ('4', '595aa50', 'b', false, '2021-11-11 22:34:39.235'),
    ('5', '595aa50', 'a', true, '2021-11-02 23:40:27.001'),
    ('6', '595aa50', 'b', true, '2021-11-02 23:40:27.001'),
    ('7', '595aa50', 'a', false, '2021-11-02 20:54:29.620'),
    ('8', '595aa50', 'b', false, '2021-11-02 20:54:29.620'),
    ('9', '595aa50', 'a', true, '2021-11-02 20:31:08.297'),
    ('10', '595aa50', 'b', true, '2021-11-02 20:31:08.297'),
    ('11', '595aa50', 'c', false, '2021-10-05 20:33:36.394'),
    ('12', '595aa50', 'b', false, '2021-10-05 20:33:36.394'),
    ('13', '595aa50', 'c', true, '2021-09-15 21:33:58.856'),
    ('14', '595aa50', 'b', true, '2021-09-15 21:33:58.856'),
    ('15', '595aa50', 'c', false, '2021-09-08 18:42:35.728'),
    ('16', '595aa50', 'b', false, '2021-09-08 18:42:35.728'),
    ('17', '595aa50', 'c', true, '2021-09-01 23:09:15.596'),
    ('18', '595aa50', 'b', true, '2021-09-01 23:09:15.596');

select 
    *
from example;

Solution

  • I figured it out while re-reading the question.

    with base as (
        select 
            *,
            lag(event_time, 1) over (partition by split_test_id, branch order by event_time desc) as ended_at
        from example
    )
    
    select 
        split_test_id, 
        branch, 
        event_time as started_at,
        ended_at
    from base 
    where is_active = true
    order by started_at desc