Search code examples
sqlgoogle-bigqueryrow-number

SQL rownumber until column value changes back


I am trying to create a column that represents how many days ago the most recent promotion started for each product. It should continue counting on when the promotion ended, until the next promotion starts.

For example, I want the following:

product date in_promo days_since_last_promo
1 2023-10-01 false null
1 2023-10-02 false null
1 2023-10-03 true 0
1 2023-10-04 true 1
1 2023-10-05 true 2
1 2023-10-06 false 3
1 2023-10-07 false 4
1 2023-10-08 true 0
1 2023-10-09 true 1
1 2023-10-10 false 2

Particularly, I'm having trouble getting the right days_since_last_promo for these lines:

product date in_promo days_since_last_promo
1 2023-10-06 false 3
1 2023-10-07 false 4

I've been puzzling with lags, row_number() and partition by's but I can't figure it out. Is this even possible in SQL?

I would say it's related to this post, but we're trying to achieve something slightly different.

I tried for example

select 
  product
  , date
  , in_promo
  , row_number() over (partition by recipe_id, in_promo, seqnum_u - seqnum_uo
                      order by date_cet
    ) as days_since_last_promo
from (select p.*,
             row_number() over (partition by product order by date) as seqnum_u,
             row_number() over (partition by product, in_promo order by date) as seqnum_uo
      from product_sales_data as p
      )

But that will give me

product date in_promo days_since_last_promo
1 2023-10-01 false 1
1 2023-10-02 false 2
1 2023-10-03 true 1
1 2023-10-04 true 2
1 2023-10-05 true 3
1 2023-10-06 false 1
1 2023-10-07 false 2
1 2023-10-08 true 1
1 2023-10-09 true 2
1 2023-10-10 false 1

i.e. restarting the row_number when in_promo=false.


Solution

  • Here a solution using ORACLE syntax but with standard analytic functions, assuming the start of the promotion is the first date in_promo for a sequence of rows ( in_promo* !in_promo+ ) (should be easier with MATCH_RECOGNIZE but would be ORACLE-only):

    with data(product, dat, in_promo) as (
        select 1, date '2023-10-01', 'false' from dual union all
        select 1, date '2023-10-02', 'false' from dual union all
        select 1, date '2023-10-03', 'true' from dual union all
        select 1, date '2023-10-04', 'true' from dual union all
        select 1, date '2023-10-05', 'true' from dual union all
        select 1, date '2023-10-06', 'false' from dual union all
        select 1, date '2023-10-07', 'false' from dual union all
        select 1, date '2023-10-08', 'true' from dual union all
        select 1, date '2023-10-09', 'true' from dual union all
        select 1, date '2023-10-10', 'false' from dual
    )
    select d.product, d.dat, 
        sum(ndays) over(partition by product, grp order by dat) as days_since_last_promo
    from (
        select d.*,
            case when in_promo = 0 and grp = 0 then null
            else
                nvl(
                    dat - last_value(dat) over(partition by product, grp order by dat 
                        rows between unbounded preceding and 1 preceding),
                    0
                )
            end
            as ndays
        from (
            select d.*, 
                sum(change) over(partition by product order by dat) as grp
            from (
                select d.*, 
                    decode(in_promo,1,
                        decode(1,lag(in_promo) over(partition by product order by dat),0,1),
                        0
                    ) as change
                from (select product, dat, decode(in_promo,'true',1,0) as in_promo from data) d
            ) d
        ) d
    ) d
    order by dat
    ;
    
    
    1   01/10/2023 00:00:00 
    1   02/10/2023 00:00:00 
    1   03/10/2023 00:00:00 0
    1   04/10/2023 00:00:00 1
    1   05/10/2023 00:00:00 2
    1   06/10/2023 00:00:00 3
    1   07/10/2023 00:00:00 4
    1   08/10/2023 00:00:00 0
    1   09/10/2023 00:00:00 1
    1   10/10/2023 00:00:00 2