Search code examples
postgresqlaggregate-functionswindow-functionsgaps-and-islands

Designate groups of consecutive equal values in an ordered dataset


I'm trying to get the output to look like below. The problem is that I can't do a first_value or RANK because when I partition by event and order by time, then it doesn't break them up in that order. I need them to order by time first and then partition each time.

enter image description here


Solution

  • One of the known solutions

    Use lag() to mark rows when event changes and cumulative sum() to designate groups, e.g.:

    with my_table(event, time) as (
    values 
        ('A', '12:01'),
        ('A', '12:02'),
        ('B', '12:03'),
        ('A', '12:04'),
        ('A', '12:05'),
        ('B', '12:06'),
        ('B', '12:07'),
        ('A', '12:08')
    )
    
    select 
        event, 
        time, 
        sum(change) over (order by time) as "desired row number"
    from (
        select 
            event, 
            time, 
            (event is distinct from lag(event) over (order by time))::int as change
        from my_table
        ) s
    
     event | time  | desired row number 
    -------+-------+--------------------
     A     | 12:01 |                  1
     A     | 12:02 |                  1
     B     | 12:03 |                  2
     A     | 12:04 |                  3
     A     | 12:05 |                  3
     B     | 12:06 |                  4
     B     | 12:07 |                  4
     A     | 12:08 |                  5
    (8 rows)
    

    Custom aggregate

    It would be nice to have the function:

    select *, group_number(event) over (order by time)
    from my_table;
    

    This can be done with the custom aggregate:

    create type group_number_internal as (number int, lag text);
    
    create or replace function group_number_transition(group_number_internal, anyelement)
    returns group_number_internal language sql strict as $$
        select 
            case 
                when $2::text is distinct from $1.lag then $1.number+ 1 
                else $1.number 
            end, 
            $2::text
    $$;
    
    create or replace function group_number_final(group_number_internal)
    returns int language sql as $$
        select $1.number
    $$;
    
    create aggregate group_number(anyelement) (
        sfunc = group_number_transition,
        stype = group_number_internal,
        finalfunc = group_number_final,
        initcond = '(0, null)'
    );
    

    Test it in rextester.