Search code examples
sqlgoogle-bigquerywindow-functions

Using window functions to get first value along with group by sum


I want to get the "top row" of data for each grouping, along with aggregate metrics that span the full grouping.

Below is a concrete example where I solve my problem using a join.

Example data:

create or replace table TABLE_ID
(
fruit string,
store string,
state string,
cost numeric
);

insert into TABLE_ID
values
('apple', 'Whole Foods', 'CA', 28.3),
('apple', 'Walmart', 'UT', 3.2),
('apple', 'Whole Foods', 'AZ', 4.4),
('apple', 'Walmart', 'NY', 5.1),
('banana', 'Whole Foods', 'CO', 2.3),
('banana', 'Whole Foods', 'AZ', 28.8),
('banana', 'Walmart', 'NY', 93.3),
('banana', 'Whole Foods', 'NY', 20.1);

Solution:

select b.*, a.total_cost
from (
  select
  fruit, sum(cost) as total_cost
  from TABLE_ID
  group by fruit
) a
left join
(
  select fruit, store as top_purchase_store, state as top_purchase_state
  from TABLE_ID
  qualify row_number() over (partition by fruit order by cost desc) = 1
) b
on a.fruit = b.fruit
;

Output:

total_cost  fruit   top_purchase_store  top_purchase_state
41          apple   Whole Foods         CA
144.5       banana  Walmart             NY

I feel like it should be possible to do this without the use of joins. However, I've not been able to combine first_value with sum aggregate as desired.

Is there another approach you would suggest?


Solution

  • You might consider below approach without using window functions as well.

    SELECT SUM(cost) total_cost, fruit,
           ANY_VALUE(
             STRUCT(store AS top_purchase_store, state AS top_purchase_state)
             HAVING MAX cost
           ).*
      FROM TABLE_ID
     GROUP BY fruit;
    
    -- Query results
    +------------+--------+--------------------+--------------------+
    | total_cost | fruit  | top_purchase_store | top_purchase_state |
    +------------+--------+--------------------+--------------------+
    |       41.0 | apple  | Whole Foods        | CA                 |
    |      144.5 | banana | Walmart            | NY                 |
    +------------+--------+--------------------+--------------------+
    

    You can use below instead of ANY_VALUE(...).* part in above query also.

           ARRAY_AGG(STRUCT(
             store AS top_purchase_store, state AS top_purchase_state
           ) ORDER BY cost DESC LIMIT 1)[OFFSET(0)].*