I am trying to create a column that represents how many days ago the most recent promotion started for each product. It should continue counting on when the promotion ended, until the next promotion starts.
For example, I want the following:
product | date | in_promo | days_since_last_promo |
---|---|---|---|
1 | 2023-10-01 | false | null |
1 | 2023-10-02 | false | null |
1 | 2023-10-03 | true | 0 |
1 | 2023-10-04 | true | 1 |
1 | 2023-10-05 | true | 2 |
1 | 2023-10-06 | false | 3 |
1 | 2023-10-07 | false | 4 |
1 | 2023-10-08 | true | 0 |
1 | 2023-10-09 | true | 1 |
1 | 2023-10-10 | false | 2 |
Particularly, I'm having trouble getting the right days_since_last_promo
for these lines:
product | date | in_promo | days_since_last_promo |
---|---|---|---|
1 | 2023-10-06 | false | 3 |
1 | 2023-10-07 | false | 4 |
I've been puzzling with lags, row_number() and partition by's but I can't figure it out. Is this even possible in SQL?
I would say it's related to this post, but we're trying to achieve something slightly different.
I tried for example
select
product
, date
, in_promo
, row_number() over (partition by recipe_id, in_promo, seqnum_u - seqnum_uo
order by date_cet
) as days_since_last_promo
from (select p.*,
row_number() over (partition by product order by date) as seqnum_u,
row_number() over (partition by product, in_promo order by date) as seqnum_uo
from product_sales_data as p
)
But that will give me
product | date | in_promo | days_since_last_promo |
---|---|---|---|
1 | 2023-10-01 | false | 1 |
1 | 2023-10-02 | false | 2 |
1 | 2023-10-03 | true | 1 |
1 | 2023-10-04 | true | 2 |
1 | 2023-10-05 | true | 3 |
1 | 2023-10-06 | false | 1 |
1 | 2023-10-07 | false | 2 |
1 | 2023-10-08 | true | 1 |
1 | 2023-10-09 | true | 2 |
1 | 2023-10-10 | false | 1 |
i.e. restarting the row_number when in_promo=false
.
Here a solution using ORACLE syntax but with standard analytic functions, assuming the start of the promotion is the first date in_promo for a sequence of rows ( in_promo* !in_promo+ ) (should be easier with MATCH_RECOGNIZE but would be ORACLE-only):
with data(product, dat, in_promo) as (
select 1, date '2023-10-01', 'false' from dual union all
select 1, date '2023-10-02', 'false' from dual union all
select 1, date '2023-10-03', 'true' from dual union all
select 1, date '2023-10-04', 'true' from dual union all
select 1, date '2023-10-05', 'true' from dual union all
select 1, date '2023-10-06', 'false' from dual union all
select 1, date '2023-10-07', 'false' from dual union all
select 1, date '2023-10-08', 'true' from dual union all
select 1, date '2023-10-09', 'true' from dual union all
select 1, date '2023-10-10', 'false' from dual
)
select d.product, d.dat,
sum(ndays) over(partition by product, grp order by dat) as days_since_last_promo
from (
select d.*,
case when in_promo = 0 and grp = 0 then null
else
nvl(
dat - last_value(dat) over(partition by product, grp order by dat
rows between unbounded preceding and 1 preceding),
0
)
end
as ndays
from (
select d.*,
sum(change) over(partition by product order by dat) as grp
from (
select d.*,
decode(in_promo,1,
decode(1,lag(in_promo) over(partition by product order by dat),0,1),
0
) as change
from (select product, dat, decode(in_promo,'true',1,0) as in_promo from data) d
) d
) d
) d
order by dat
;
1 01/10/2023 00:00:00
1 02/10/2023 00:00:00
1 03/10/2023 00:00:00 0
1 04/10/2023 00:00:00 1
1 05/10/2023 00:00:00 2
1 06/10/2023 00:00:00 3
1 07/10/2023 00:00:00 4
1 08/10/2023 00:00:00 0
1 09/10/2023 00:00:00 1
1 10/10/2023 00:00:00 2