I'm trying to select pairs of events from my data. I have a data that looks like this:
CREATE TABLE test (id, status, created) AS VALUES
(1, 'on', '2024-09-01'::DATE)
, (2, 'on', '2024-09-02'::DATE)
, (1, 'off', '2024-09-03'::DATE)
, (1, 'on', '2024-09-04'::DATE)
, (2, 'off', '2024-09-05'::DATE)
, (3, 'on', '2024-09-06'::DATE)
, (1, 'off', '2024-09-07'::DATE)
, (4, 'off', '2024-09-08'::DATE)
, (4, 'off', '2024-09-09'::DATE)
, (4, 'on', '2024-09-10'::DATE)
, (4, 'on', '2024-09-11'::DATE)
, (4, 'on', '2024-09-12'::DATE)
, (3, 'on', '2024-09-12'::DATE)
, (3, 'off', '2024-09-14'::DATE)
, (4, 'off', '2024-09-16'::DATE);
I would like to have data like this:
id | switch_on | switch_off |
---|---|---|
1 | 2024-09-01 | 2024-09-03 |
1 | 2024-09-04 | 2024-09-07 |
2 | 2024-09-02 | 2024-09-05 |
3 | 2024-09-06 | |
3 | 2024-09-12 | 2024-09-14 |
4 | 2024-09-08 | |
4 | 2024-09-09 | |
4 | 2024-09-10 | |
4 | 2024-09-11 | |
4 | 2024-09-12 | 2024-09-16 |
I am interested in:
id
in the datanull
values when either end of event pair is missingThis is no easy task. What I do below is to build groups of up to two rows using the internal row ID called CTID
in PostgreSQL. All rows use their own CTID for a group key, except for 'off' rows following immediately an 'on' row. These get the 'on' rows' CTIDs instead, thus coupling with them.
In the next step I aggregate by those group keys showing the switch_on and switch_off values. At last PostgreSQL forces me to wrap this again in a select, because it is unable to apply COALESCE
on the selected alias names, which I consider a bug.
select *
from
(
select
id,
min(created) filter (where status = 'on') as switch_on,
min(created) filter (where status = 'off') as switch_off
from
(
select
id,
status,
created,
case when status = 'off' and lag(status) over w = 'on'
then lag(ctid) over w
else ctid
end as grp
from test
window w as (partition by id order by created, status desc)
) grouped
group by grp, id
) result
order by id, coalesce(switch_on, switch_off);