Search code examples
sqlpostgresql

Group multiple events for ID in postgreSQL data


I'm trying to select pairs of events from my data. I have a data that looks like this:

CREATE TABLE test (id, status, created) AS VALUES
       (1, 'on',  '2024-09-01'::DATE)
     , (2, 'on',  '2024-09-02'::DATE)
     , (1, 'off', '2024-09-03'::DATE)
     , (1, 'on',  '2024-09-04'::DATE)
     , (2, 'off', '2024-09-05'::DATE)
     , (3, 'on',  '2024-09-06'::DATE)
     , (1, 'off', '2024-09-07'::DATE)
     , (4, 'off', '2024-09-08'::DATE)
     , (4, 'off', '2024-09-09'::DATE)
     , (4, 'on',  '2024-09-10'::DATE)
     , (4, 'on',  '2024-09-11'::DATE)
     , (4, 'on',  '2024-09-12'::DATE)
     , (3, 'on',  '2024-09-12'::DATE)
     , (3, 'off', '2024-09-14'::DATE)
     , (4, 'off', '2024-09-16'::DATE);

I would like to have data like this:

id switch_on switch_off
1 2024-09-01 2024-09-03
1 2024-09-04 2024-09-07
2 2024-09-02 2024-09-05
3 2024-09-06
3 2024-09-12 2024-09-14
4 2024-09-08
4 2024-09-09
4 2024-09-10
4 2024-09-11
4 2024-09-12 2024-09-16

I am interested in:

  1. multiple events for the same id in the data
  2. null values when either end of event pair is missing

Solution

  • This is no easy task. What I do below is to build groups of up to two rows using the internal row ID called CTID in PostgreSQL. All rows use their own CTID for a group key, except for 'off' rows following immediately an 'on' row. These get the 'on' rows' CTIDs instead, thus coupling with them.

    In the next step I aggregate by those group keys showing the switch_on and switch_off values. At last PostgreSQL forces me to wrap this again in a select, because it is unable to apply COALESCE on the selected alias names, which I consider a bug.

    select *
    from
    (
      select
        id,
        min(created) filter (where status = 'on') as switch_on,
        min(created) filter (where status = 'off') as switch_off
      from
      (
        select 
          id,
          status,
          created,
          case when status = 'off' and lag(status) over w = 'on'
            then lag(ctid) over w
            else ctid
            end as grp
        from test
        window w as (partition by id order by created, status desc)
      ) grouped
      group by grp, id
    ) result
    order by id, coalesce(switch_on, switch_off);
    

    Demo: https://dbfiddle.uk/KqJoUw_O