I'm working with a table in BigQuery that has millions of lines and I'm trying to reduce the granularity of this data.
My end goal is to have a table looking like this.
Week | Model | Quantity | Price |
---|---|---|---|
10 | AA | 5 | 90 |
10 | BB | 2 | 40 |
11 | CC | 4 | 50 |
DATE_TRUNC(t.DATE_REQUEST, WEEK(MONDAY)) AS PURCHASE_WEEK_MONDAY_START
So in the example table above Week 10 has two different Models in AA and BB. AA has a sum of Quantity 5 (MAX for Order Number A23 and B45 is 2 and 3) and the Price 90 is the SUM of the Price lines. BB with Order Number E78 has a MAX Quantity of 2 with a Price SUM of 40.
The original table has the following structure. No null values are allowed.
Order Date | Order Number | Type | Model | Quantity | Price |
---|---|---|---|---|---|
2023-03-10 | A23 | Z | AA | 2 | 10 |
2023-03-10 | A23 | X | AA | 2 | 20 |
2023-03-10 | A23 | Y | AA | 2 | 10 |
2023-03-11 | E78 | X | BB | 2 | 20 |
2023-03-11 | E78 | Z | BB | 2 | 20 |
2023-03-12 | B45 | Y | AA | 3 | 5 |
2023-03-12 | B45 | Z | AA | 3 | 15 |
2023-03-12 | B45 | X | AA | 3 | 20 |
2023-03-12 | B45 | Q | AA | 3 | 10 |
2023-03-15 | C56 | X | CC | 1 | 30 |
2023-03-15 | C56 | Y | CC | 1 | 5 |
2023-03-17 | D89 | Z | CC | 3 | 15 |
You can do this with two levels of aggregation. First get the maximum of quantity of each week/order/model tuple, then sum by week/order.
select purchase_week, model,
sum(max_quantity) quantity,
sum(sum_price) price
from (
select
date_trunc(order_date, ISOWEEK) purchase_week,
order_number,
model,
max(quantity) max_quantity,
sum(price) sum_price
from mytable t
group by purchase_week, order_number, model
) t
group by purchase_week, model
order by purchase_week, model
Note: I used ISOWEEK
to truncate the date; ISO weeks start on Mondays. The first ISO week contains the first Thursday of the year, as explained in the documentation.