Search code examples
sqlgoogle-bigquerydata-warehouse

Aggregate functions for daily totals using records with start and end dates


I am working with a list of "sale event" records in a BigQuery data warehouse. Each represents a time period when a particular item was available to purchase at a specific price. I need to calculate the combined dollar value of all items for sale on each day - assuming quantity = 1 here for simplicity.

So for the following input:

Sale_ID SKU start_date end_date sale_price
ABC 123 2023-01-01 2023-01-04 3000.00
DEF 123 2023-01-05 2023-01-10 2500.00
GHI 456 2023-01-03 2023-01-08 1200.00
JKL 789 2023-01-02 2023-01-10 2400.00

Output would be:

selling_date total_value_for_sale items_for_sale*
2023-01-01 3000.00 123
2023-01-02 5400.00 123, 789
2023-01-03 6600.00 123, 456, 789
2023-01-04 6600.00 123, 456, 789
2023-01-05 6100.00 123, 456, 789
2023-01-06 6100.00 123, 456, 789
2023-01-07 6100.00 123, 456, 789
2023-01-08 6100.00 123, 456, 789
2023-01-09 3900.00 123, 789
2023-01-10 3900.00 123, 789

*items_for_sale is not required output, just showing it to make the example clearer

I am using the following very simple but computationally intensive solution, which is not ideal for the large volume of data we have. I'm curious if anyone has a method that doesn't duplicate each sale record for each day it was active.

with date_series as (
select dd
from unnest(generate_date_array(date('2023-01-01'), date('2023-01-10'), INTERVAL 1 DAY)) as dd)

select
d.dd as selling_date,
sum(price) as total_value_for_sale
from date_series d
left join sales_records s on s.start_date <= d.dd and s.end_date >= d.dd
group by selling_date
order by selling_date

Solution

  • You can try below

    SELECT selling_date, SUM(sale_price) total_value_for_sale, STRING_AGG(SKU, ', ') items_for_sale
      FROM sale_event,
    UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) selling_date
     GROUP BY 1;
    

    Query results

    enter image description here