Search code examples
sqlgoogle-bigqueryaggregate-functionsdate-arithmetic

Sum values based on Unique values in other column


I'm working with a table in BigQuery that has millions of lines and I'm trying to reduce the granularity of this data.

My end goal is to have a table looking like this.

Week Model Quantity Price
10 AA 5 90
10 BB 2 40
11 CC 4 50
  • Week : First I want to reduce the number of lines based on Order Date by aggregating the data by week. I believe the following should work for that: DATE_TRUNC(t.DATE_REQUEST, WEEK(MONDAY)) AS PURCHASE_WEEK_MONDAY_START
  • Model : The Model is not unique, but there can only be one Model per Order Number
  • Quantity : The Quantity number is the tricky one as I want to return the MAX Quantity per Order Number. So even though Order Number A23 has three lines of 2 I only want it to return the number 2 (not 6).
  • Price should be SUM for all lines.

So in the example table above Week 10 has two different Models in AA and BB. AA has a sum of Quantity 5 (MAX for Order Number A23 and B45 is 2 and 3) and the Price 90 is the SUM of the Price lines. BB with Order Number E78 has a MAX Quantity of 2 with a Price SUM of 40.

The original table has the following structure. No null values are allowed.

Order Date Order Number Type Model Quantity Price
2023-03-10 A23 Z AA 2 10
2023-03-10 A23 X AA 2 20
2023-03-10 A23 Y AA 2 10
2023-03-11 E78 X BB 2 20
2023-03-11 E78 Z BB 2 20
2023-03-12 B45 Y AA 3 5
2023-03-12 B45 Z AA 3 15
2023-03-12 B45 X AA 3 20
2023-03-12 B45 Q AA 3 10
2023-03-15 C56 X CC 1 30
2023-03-15 C56 Y CC 1 5
2023-03-17 D89 Z CC 3 15

Solution

  • You can do this with two levels of aggregation. First get the maximum of quantity of each week/order/model tuple, then sum by week/order.

    select purchase_week, model, 
        sum(max_quantity) quantity, 
        sum(sum_price) price
    from (
        select
            date_trunc(order_date, ISOWEEK) purchase_week,
            order_number,
            model,
            max(quantity) max_quantity,
            sum(price) sum_price
        from mytable t
        group by purchase_week, order_number, model
    ) t
    group by purchase_week, model
    order by purchase_week, model
    

    Note: I used ISOWEEK to truncate the date; ISO weeks start on Mondays. The first ISO week contains the first Thursday of the year, as explained in the documentation.