I want to get the "top row" of data for each grouping, along with aggregate metrics that span the full grouping.
Below is a concrete example where I solve my problem using a join.
Example data:
create or replace table TABLE_ID
(
fruit string,
store string,
state string,
cost numeric
);
insert into TABLE_ID
values
('apple', 'Whole Foods', 'CA', 28.3),
('apple', 'Walmart', 'UT', 3.2),
('apple', 'Whole Foods', 'AZ', 4.4),
('apple', 'Walmart', 'NY', 5.1),
('banana', 'Whole Foods', 'CO', 2.3),
('banana', 'Whole Foods', 'AZ', 28.8),
('banana', 'Walmart', 'NY', 93.3),
('banana', 'Whole Foods', 'NY', 20.1);
Solution:
select b.*, a.total_cost
from (
select
fruit, sum(cost) as total_cost
from TABLE_ID
group by fruit
) a
left join
(
select fruit, store as top_purchase_store, state as top_purchase_state
from TABLE_ID
qualify row_number() over (partition by fruit order by cost desc) = 1
) b
on a.fruit = b.fruit
;
Output:
total_cost fruit top_purchase_store top_purchase_state
41 apple Whole Foods CA
144.5 banana Walmart NY
I feel like it should be possible to do this without the use of joins. However, I've not been able to combine first_value with sum aggregate as desired.
Is there another approach you would suggest?
You might consider below approach without using window functions as well.
SELECT SUM(cost) total_cost, fruit,
ANY_VALUE(
STRUCT(store AS top_purchase_store, state AS top_purchase_state)
HAVING MAX cost
).*
FROM TABLE_ID
GROUP BY fruit;
-- Query results
+------------+--------+--------------------+--------------------+
| total_cost | fruit | top_purchase_store | top_purchase_state |
+------------+--------+--------------------+--------------------+
| 41.0 | apple | Whole Foods | CA |
| 144.5 | banana | Walmart | NY |
+------------+--------+--------------------+--------------------+
You can use below instead of ANY_VALUE(...).*
part in above query also.
ARRAY_AGG(STRUCT(
store AS top_purchase_store, state AS top_purchase_state
) ORDER BY cost DESC LIMIT 1)[OFFSET(0)].*