I am working with a list of "sale event" records in a BigQuery data warehouse. Each represents a time period when a particular item was available to purchase at a specific price. I need to calculate the combined dollar value of all items for sale on each day - assuming quantity = 1 here for simplicity.
So for the following input:
Sale_ID | SKU | start_date | end_date | sale_price |
---|---|---|---|---|
ABC | 123 | 2023-01-01 | 2023-01-04 | 3000.00 |
DEF | 123 | 2023-01-05 | 2023-01-10 | 2500.00 |
GHI | 456 | 2023-01-03 | 2023-01-08 | 1200.00 |
JKL | 789 | 2023-01-02 | 2023-01-10 | 2400.00 |
Output would be:
selling_date | total_value_for_sale | items_for_sale* |
---|---|---|
2023-01-01 | 3000.00 | 123 |
2023-01-02 | 5400.00 | 123, 789 |
2023-01-03 | 6600.00 | 123, 456, 789 |
2023-01-04 | 6600.00 | 123, 456, 789 |
2023-01-05 | 6100.00 | 123, 456, 789 |
2023-01-06 | 6100.00 | 123, 456, 789 |
2023-01-07 | 6100.00 | 123, 456, 789 |
2023-01-08 | 6100.00 | 123, 456, 789 |
2023-01-09 | 3900.00 | 123, 789 |
2023-01-10 | 3900.00 | 123, 789 |
*items_for_sale is not required output, just showing it to make the example clearer
I am using the following very simple but computationally intensive solution, which is not ideal for the large volume of data we have. I'm curious if anyone has a method that doesn't duplicate each sale record for each day it was active.
with date_series as (
select dd
from unnest(generate_date_array(date('2023-01-01'), date('2023-01-10'), INTERVAL 1 DAY)) as dd)
select
d.dd as selling_date,
sum(price) as total_value_for_sale
from date_series d
left join sales_records s on s.start_date <= d.dd and s.end_date >= d.dd
group by selling_date
order by selling_date
You can try below
SELECT selling_date, SUM(sale_price) total_value_for_sale, STRING_AGG(SKU, ', ') items_for_sale
FROM sale_event,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) selling_date
GROUP BY 1;
Query results