Data Table:
Dates | Customer | Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/02/2020 | B | free |
23/03/2020 | A | full |
01/03/2020 | B | full |
01/03/2020 | A | full |
02/03/2020 | A | full |
Need to fill gaps in dates by the value in the previous date
Output:
Dates | Customer | Last Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/03/2020 | A | full |
23/03/2020 | B | free |
24/02/2020 | A | full |
24/02/2020 | B | free |
25/02/2020 | A | full |
25/02/2020 | B | free |
26/02/2020 | A | full |
26/02/2020 | B | free |
27/02/2020 | A | full |
27/02/2020 | B | free |
28/02/2020 | A | full |
28/02/2020 | B | free |
01/03/2020 | A | full |
01/03/2020 | B | full |
02/03/2020 | A | full |
02/03/2020 | B | full |
I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery, but it is not suitable because in my example each Customer
has a different start date.
Consider below
with temp as (
select customer, dates from (
select customer, min(dates) min_date, max(dates) max_date
from `project.dataset.table`
group by customer
), unnest(generate_date_array(min_date, max_date)) dates
)
select customer, dates,
first_value(subscription ignore nulls) over win as subscription
from temp a
left join `project.dataset.table` b
using(customer, dates)
window win as (partition by customer order by dates desc rows between current row and unbounded following)
# order by dates, customer
If to apply to sample data in y our question - output is