Here is an example table containing events for a single split test object in the database. The split test can be turned on and off, as tracked in is_active
.
EVENT_ID | SPLIT_TEST_ID | BRANCH | IS_ACTIVE | EVENT_TIME |
---|---|---|---|---|
1 | 595aa50 | a | TRUE | 2021-11-11 22:53:08.360 |
2 | 595aa50 | b | TRUE | 2021-11-11 22:53:08.360 |
3 | 595aa50 | a | FALSE | 2021-11-11 22:34:39.235 |
4 | 595aa50 | b | FALSE | 2021-11-11 22:34:39.235 |
5 | 595aa50 | a | TRUE | 2021-11-02 23:40:27.001 |
6 | 595aa50 | b | TRUE | 2021-11-02 23:40:27.001 |
7 | 595aa50 | a | FALSE | 2021-11-02 20:54:29.620 |
8 | 595aa50 | b | FALSE | 2021-11-02 20:54:29.620 |
9 | 595aa50 | a | TRUE | 2021-11-02 20:31:08.297 |
10 | 595aa50 | b | TRUE | 2021-11-02 20:31:08.297 |
11 | 595aa50 | c | FALSE | 2021-10-05 20:33:36.394 |
12 | 595aa50 | b | FALSE | 2021-10-05 20:33:36.394 |
13 | 595aa50 | c | TRUE | 2021-09-15 21:33:58.856 |
14 | 595aa50 | b | TRUE | 2021-09-15 21:33:58.856 |
15 | 595aa50 | c | FALSE | 2021-09-08 18:42:35.728 |
16 | 595aa50 | b | FALSE | 2021-09-08 18:42:35.728 |
17 | 595aa50 | c | TRUE | 2021-09-01 23:09:15.596 |
18 | 595aa50 | b | TRUE | 2021-09-01 23:09:15.596 |
I'm trying to transform the table so that we get a single row for each branch every time the test is run, with started_at
and ended_at
values instead of event_time
. If the test hasn't yet been turned off, the value of ended_at
should be null
.
This is the result I'm looking for:
SPLIT_TEST_ID | BRANCH | STARTED_AT | ENDED_AT |
---|---|---|---|
595aa50 | a | 2021-11-11 22:53:08.360 | null |
595aa50 | b | 2021-11-11 22:53:08.360 | null |
595aa50 | a | 2021-11-02 23:40:27.001 | 2021-11-11 22:34:39.235 |
595aa50 | b | 2021-11-02 23:40:27.001 | 2021-11-11 22:34:39.235 |
595aa50 | a | 2021-11-02 20:31:08.297 | 2021-11-02 20:54:29.620 |
595aa50 | b | 2021-11-02 20:31:08.297 | 2021-11-02 20:54:29.620 |
595aa50 | c | 2021-09-15 21:33:58.856 | 2021-10-05 20:33:36.394 |
595aa50 | b | 2021-09-15 21:33:58.856 | 2021-10-05 20:33:36.394 |
595aa50 | c | 2021-09-01 23:09:15.596 | 2021-09-08 18:42:35.728 |
595aa50 | b | 2021-09-01 23:09:15.596 | 2021-09-08 18:42:35.728 |
I'm struggling with pulling ended_at
based on is_active
. I've tried things like
select
split_test_id,
branch,
event_time as started_at,
last_value(started_at) over (partition by split_test_id, branch, is_active order by event_time) as ended_at
from example
where is_active = true
but I know they're logically flawed.
SQL to create the example table above:
drop table if exists example;
create table example (
event_id varchar, -- unique
split_test_id varchar,
branch varchar,
is_active boolean,
started_at timestamp_ntz
);
insert into example values
('1', '595aa50', 'a', true, '2021-11-11 22:53:08.360'),
('2', '595aa50', 'b', true, '2021-11-11 22:53:08.360'),
('3', '595aa50', 'a', false, '2021-11-11 22:34:39.235'),
('4', '595aa50', 'b', false, '2021-11-11 22:34:39.235'),
('5', '595aa50', 'a', true, '2021-11-02 23:40:27.001'),
('6', '595aa50', 'b', true, '2021-11-02 23:40:27.001'),
('7', '595aa50', 'a', false, '2021-11-02 20:54:29.620'),
('8', '595aa50', 'b', false, '2021-11-02 20:54:29.620'),
('9', '595aa50', 'a', true, '2021-11-02 20:31:08.297'),
('10', '595aa50', 'b', true, '2021-11-02 20:31:08.297'),
('11', '595aa50', 'c', false, '2021-10-05 20:33:36.394'),
('12', '595aa50', 'b', false, '2021-10-05 20:33:36.394'),
('13', '595aa50', 'c', true, '2021-09-15 21:33:58.856'),
('14', '595aa50', 'b', true, '2021-09-15 21:33:58.856'),
('15', '595aa50', 'c', false, '2021-09-08 18:42:35.728'),
('16', '595aa50', 'b', false, '2021-09-08 18:42:35.728'),
('17', '595aa50', 'c', true, '2021-09-01 23:09:15.596'),
('18', '595aa50', 'b', true, '2021-09-01 23:09:15.596');
select
*
from example;
I figured it out while re-reading the question.
with base as (
select
*,
lag(event_time, 1) over (partition by split_test_id, branch order by event_time desc) as ended_at
from example
)
select
split_test_id,
branch,
event_time as started_at,
ended_at
from base
where is_active = true
order by started_at desc