How can I group a dataset of sequences by the first value of each sequence in SQL?
For example, I have the following dataset
id name key metric
1 alice a 0 <- key = 'a', start of a sequence
2 alice b 1
3 alice b 1
-----------------
4 alice a 1 <- key = 'a', start of a sequence
5 alice b 0
6 alice b 0
7 alice b 0
-----------------
8 bob a 1 <- key = 'a', start of a sequence
9 bob b 1
-----------------
10 bob a 0 <- key = 'a', start of a sequence
Rows with key = 'a'
start a new group. I want to, for example, sum the metrics for all the subsequent rows till I reach another key = 'a'
or another name
.
The dataset is sorted by id
.
The final result should be this:
id name metric
1 alice 2
4 alice 1
8 bob 2
10 bob 0
Here's the equivalent operation in JavaScript, but I want to be able to get the same result by a SQL query.
data.reduce((acc, a) => {
if(a.key === 'a'){
// key = 'a' starts a new group
return [{id: a.id, name: a.name, metric: a.metric}].concat(acc)
} else {
// because the data is sorted,
// all the subsequent rows with key = 'b' belong to the latest group
const [head, ...tail] = acc
const head_updated = {...head, metric: head.metric + a.metric}
return [head_updated, ...tail]
}
}, [])
.reverse()
Sample SQL dataset:
with dataset as (
select
1 as id
, 'alice' as name
, 'a' as key
, 0 as metric
union select
2 as id
, 'alice' as name
, 'b' as key
, 1 as metric
union select
3 as id
, 'alice' as name
, 'b' as key
, 1 as metric
union select
4 as id
, 'alice' as name
, 'a' as key
, 1 as metric
union select
5 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
6 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
7 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
8 as id
, 'bob' as name
, 'a' as key
, 1 as metric
union select
9 as id
, 'bob' as name
, 'b' as key
, 1 as metric
union select
10 as id
, 'bob' as name
, 'a' as key
, 0 as metric
)
select * from dataset
order by name, id
You can use the window function sum()
to create the groups and then aggregate:
select min(id) id, name, sum(metric) metric
from (
select *, sum((key = 'a')::int) over (partition by name order by id) grp
from dataset
) t
group by name, grp
order by id
See the demo.
Results:
> id | name | metric
> -: | :---- | -----:
> 1 | alice | 2
> 4 | alice | 1
> 8 | bob | 2
> 10 | bob | 0