Search code examples
sqlpostgresqlamazon-redshiftwindow-functions

Group a sequence of rows by their first value in SQL


How can I group a dataset of sequences by the first value of each sequence in SQL?

For example, I have the following dataset

id  name  key  metric
1   alice a    0   <- key = 'a', start of a sequence
2   alice b    1
3   alice b    1
-----------------
4   alice a    1   <- key = 'a', start of a sequence
5   alice b    0
6   alice b    0
7   alice b    0
-----------------
8   bob   a    1   <- key = 'a', start of a sequence
9   bob   b    1
-----------------
10  bob   a    0   <- key = 'a', start of a sequence

Rows with key = 'a' start a new group. I want to, for example, sum the metrics for all the subsequent rows till I reach another key = 'a' or another name.

The dataset is sorted by id.

The final result should be this:

id  name   metric
1   alice  2
4   alice  1
8   bob    2
10  bob    0

Here's the equivalent operation in JavaScript, but I want to be able to get the same result by a SQL query.

data.reduce((acc, a) => {
    if(a.key === 'a'){
      // key = 'a' starts a new group
      return [{id: a.id, name: a.name, metric: a.metric}].concat(acc)
    } else {
      // because the data is sorted, 
      // all the subsequent rows with key = 'b' belong to the latest group
      const [head, ...tail] = acc
      const head_updated = {...head, metric: head.metric + a.metric}
      return [head_updated, ...tail]
    }
  }, [])
  .reverse()

Sample SQL dataset:

with dataset as (
  select 
    1       as id
  , 'alice' as name
  , 'a'     as key
  , 0       as metric
  union select
    2       as id
  , 'alice' as name
  , 'b'     as key
  , 1       as metric
  union select
    3       as id
  , 'alice' as name
  , 'b'     as key
  , 1       as metric
  union select 
    4       as id
  , 'alice' as name
  , 'a'     as key
  , 1       as metric
  union select
    5       as id
  , 'alice' as name
  , 'b'     as key
  , 0       as metric
  union select
    6       as id
  , 'alice' as name
  , 'b'     as key
  , 0       as metric
  union select
    7       as id
  , 'alice' as name
  , 'b'     as key
  , 0       as metric
  union select
    8       as id
  , 'bob'   as name
  , 'a'     as key
  , 1       as metric
  union select
    9       as id
  , 'bob'   as name
  , 'b'     as key
  , 1       as metric
  union select
    10      as id
  , 'bob'   as name
  , 'a'     as key
  , 0       as metric
)

select * from dataset
order by name, id

Solution

  • You can use the window function sum() to create the groups and then aggregate:

    select min(id) id, name, sum(metric) metric
    from (
      select *, sum((key = 'a')::int) over (partition by name order by id) grp 
      from dataset
    ) t
    group by name, grp
    order by id
    

    See the demo.
    Results:

    > id | name  | metric
    > -: | :---- | -----:
    >  1 | alice |      2
    >  4 | alice |      1
    >  8 | bob   |      2
    > 10 | bob   |      0