Search code examples
sqlamazon-redshiftwindow-functions

Redshift SQL identify first_value ordered by sum over partitioned dimensions


I got the following dataset in AWS Redshift and need to identify per each row per dim the bucket with the highest sum of value

dim add_dim bucket value
2 6 3 33
2 5 3 32
2 4 3 31
1 1 1 45
1 3 2 42
1 2 2 41

Result should be

dim add_dim bucket value flag
2 6 3 33 true
2 5 3 32 true
2 4 3 31 true
1 1 1 45 false
1 3 2 42 true
1 2 2 41 true

However using a window function doesn't sort on the sum of values over the partition but instead by each row. How can I sum the rows of bucket per dim for evaluation?

select 
dim
,add_dim
,bucket
,value
,bucket = first_value(bucket) over (partition by dim ORDER BY 
sum(value) desc
,bucket asc
rows between unbounded preceding and unbounded following
) flag 
from (select  1 as bucket, 45 as value, 1 as add_dim, 1 as dim  
union all
 select  2 as bucket, 41 as value, 2 as add_dim, 1 as dim
 union all
 select  2 as bucket, 42 as value, 3 as add_dim, 1 as dim
 union all
 select  3 as bucket, 31 as value, 4 as add_dim, 2 as dim
 union all
 select  3 as bucket, 32 as value, 5 as add_dim, 2 as dim
 union all
 select  3 as bucket, 33 as value, 6 as add_dim, 2 as dim
 )
 group by bucket, value, add_dim, dim

Solution

  • You don't want to group by so you need SUM() OVER but you cannot nest window functions. so you need to create another level of SELECT. This works for your test case:

    with data1 as
    (select  1 as bucket, 45 as value, 1 as add_dim, 1 as dim  
    union all
     select  2 as bucket, 41 as value, 2 as add_dim, 1 as dim
     union all
     select  2 as bucket, 42 as value, 3 as add_dim, 1 as dim
     union all
     select  3 as bucket, 31 as value, 4 as add_dim, 2 as dim
     union all
     select  3 as bucket, 32 as value, 5 as add_dim, 2 as dim
     union all
     select  3 as bucket, 33 as value, 6 as add_dim, 2 as dim
     ),
    sums as (
      select *, sum(value) over (partition by dim, bucket) s
      from data1
    )
    select dim, add_dim, bucket, "value",
      first_value(bucket) over (partition by dim ORDER BY 
    s desc) = bucket as flag
    from sums
    order by dim, add_dim