I got the following dataset in AWS Redshift and need to identify per each row per dim the bucket with the highest sum of value
dim | add_dim | bucket | value |
---|---|---|---|
2 | 6 | 3 | 33 |
2 | 5 | 3 | 32 |
2 | 4 | 3 | 31 |
1 | 1 | 1 | 45 |
1 | 3 | 2 | 42 |
1 | 2 | 2 | 41 |
Result should be
dim | add_dim | bucket | value | flag |
---|---|---|---|---|
2 | 6 | 3 | 33 | true |
2 | 5 | 3 | 32 | true |
2 | 4 | 3 | 31 | true |
1 | 1 | 1 | 45 | false |
1 | 3 | 2 | 42 | true |
1 | 2 | 2 | 41 | true |
However using a window function doesn't sort on the sum of values over the partition but instead by each row. How can I sum the rows of bucket per dim for evaluation?
select
dim
,add_dim
,bucket
,value
,bucket = first_value(bucket) over (partition by dim ORDER BY
sum(value) desc
,bucket asc
rows between unbounded preceding and unbounded following
) flag
from (select 1 as bucket, 45 as value, 1 as add_dim, 1 as dim
union all
select 2 as bucket, 41 as value, 2 as add_dim, 1 as dim
union all
select 2 as bucket, 42 as value, 3 as add_dim, 1 as dim
union all
select 3 as bucket, 31 as value, 4 as add_dim, 2 as dim
union all
select 3 as bucket, 32 as value, 5 as add_dim, 2 as dim
union all
select 3 as bucket, 33 as value, 6 as add_dim, 2 as dim
)
group by bucket, value, add_dim, dim
You don't want to group by so you need SUM() OVER but you cannot nest window functions. so you need to create another level of SELECT. This works for your test case:
with data1 as
(select 1 as bucket, 45 as value, 1 as add_dim, 1 as dim
union all
select 2 as bucket, 41 as value, 2 as add_dim, 1 as dim
union all
select 2 as bucket, 42 as value, 3 as add_dim, 1 as dim
union all
select 3 as bucket, 31 as value, 4 as add_dim, 2 as dim
union all
select 3 as bucket, 32 as value, 5 as add_dim, 2 as dim
union all
select 3 as bucket, 33 as value, 6 as add_dim, 2 as dim
),
sums as (
select *, sum(value) over (partition by dim, bucket) s
from data1
)
select dim, add_dim, bucket, "value",
first_value(bucket) over (partition by dim ORDER BY
s desc) = bucket as flag
from sums
order by dim, add_dim