Search code examples
google-bigqueryaggregate-functionswindow-functionscumulative-sumquota

bigquery SQL, aggregate based on quota value


Problem :

  • So I have a table containing quota batch of a product. For 1 product, it will have several quota_id with each quota_id has different value of quota. Please find this table in CTE quota

  • Then I have a table of purchase, which need to be joined with quota table and aggregate it. The purchase is aggregated to fill the first batch(the first is batch with earlier batch_date), then moved to second batch when the quota is full. Refer to CTE purchase

SQL :

WITH quota AS (
select 'Product A' as product, 'Quota_batch_1' as quota_id, 10 as quota, '2022-01-01' as batch_date
UNION All
select 'Product A' as product, 'Quota_batch_2' as quota_id, 20 as quota, '2022-02-01' as batch_date
),

purchase as (
  select '2022-01-01' as sales_date, 5 as qty, 'sales_1' as sales_id, 'Product A' as product, 100 as price
  UNION ALL
  select '2022-01-01' as sales_date, 5 as qty, 'sales_2' as sales_id, 'Product A' as product, 150 as price
  UNION ALL
  select '2022-02-03' as sales_date, 2 as qty, 'sales_3' as sales_id, 'Product A' as product, 200 as price
)
select 
quota.*,
sum(qty) as quota_filled_by_qty,
sum(price) as total_price

from quota
left join purchase using (product)
group by 1,2,3,4

Expected result : enter image description here

I have no idea to do the join or the aggregation, as my current result is still aggregated for all result. I've explored like using analytical function, cumulative sum etc but still unable to crack the problem.

Current result : enter image description here

Any idea? thank you


Solution

  • An approach using LEAD navigation function and adjusting your JOIN condition:

    WITH quota AS (
    select 'Product A' as product, 'Quota_batch_1' as quota_id, 10 as quota, '2022-01-01' as batch_date
    UNION All
    select 'Product A' as product, 'Quota_batch_2' as quota_id, 20 as quota, '2022-02-01' as batch_date
    ),
    purchase as (
      select '2022-01-01' as sales_date, 5 as qty, 'sales_1' as sales_id, 'Product A' as product, 100 as price
      UNION ALL
      select '2022-01-01' as sales_date, 5 as qty, 'sales_2' as sales_id, 'Product A' as product, 150 as price
      UNION ALL
      select '2022-02-03' as sales_date, 2 as qty, 'sales_3' as sales_id, 'Product A' as product, 200 as price
    ),
    quota_next_batch as (
      SELECT 
        *, 
        LEAD(batch_date) OVER (PARTITION BY product ORDER BY batch_date) as next_batch_date
      FROM quota
    )
    SELECT 
      q.product,
      q.quota_id,
      q.quota,
      q.batch_date,
      SUM(p.qty) as quota_filled_by_qty,
      SUM(p.price) as total_price
    FROM quota_next_batch q 
    LEFT JOIN purchase p
      ON p.product = q.product AND p.sales_date >= q.batch_date AND (q.next_batch_date IS NULL OR p.sales_date < q.next_batch_date)
    GROUP BY q.product, q.quota_id, q.quota, q.batch_date
    

    Output:

    product     quota_id        quota   batch_date  quota_filled_by_qty total_price
    Product A   Quota_batch_1   10      2022-01-01  10                  250
    Product A   Quota_batch_2   20      2022-02-01  2                   200