Search code examples
sqlcommon-table-expressionlagdistinct-valueshana-sql-script

SQL Lag distinct - to the previous different value


I have the following simplified by many columns table.

 Person     Period           Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ----------------------
123         2019003          1            
123         2019004          1            
123         2019005          2            
123         2019006          2            
123         2019007          3            

For each Person, I would like to find his Previous Cost Center and save it in the corresponding column. The previous Cost Center has to be different from the current one.

What I would like to get:

Person      Period           Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ---------------------
123         2019003          1           NULL 
123         2019004          1           NULL 
123         2019005          2           1 
123         2019006          2           1                     <----- Problematic row 
123         2019007          3           2 

Standard LAG() function over the period would actually output:

Person      Period           Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ---------------------
123         2019003          1           NULL 
123         2019004          1           NULL 
123         2019005          2           1 
123         2019006          2           2                     <----- Problematic row 
123         2019007          3           2 

What I would like to have in a problematic row is the last different Cost_Center value which is 1 and not 2. What I thought of using is to check if the previous Cost_Center is different:

          CASE
                WHEN
                    LAG ( Cost_Center ) OVER ( PARTITION BY Person ORDER BY Period ) != Cost_Center
                THEN
                    LAG ( Cost_Center ) OVER ( PARTITION BY Person ORDER BY Period )
                ELSE
                    Previous_Cost_Center
                END
            AS Previous_Cost_Center,

But then I end up without a Previous_Cost_Center value in row 4 at all. How can this be archived using SQL? How can I take over the last different Cost_Center value to save it in Previous_Cost_Center sorting by Period?


Solution

  • You can do this with window functions, but it is trickier than necessary because the ignore nulls option is not available.

    First, you want to assign a grouping to adjacent values of the same group.

    Second, you want to give the first value in the grouping the previous cost center.

    Third, you want to "spread" that value through the entire group.

    select t.*,
           max(case when immediate_prev_cost_center <> cost_center then immediate_prev_cost_center
               end) over (partition by person, cost_center, (seqnum - seqnum_2)
                         ) as prev_cost_center
      from (select t.*,
                   row_number() over (partition by person order by period) as seqnum,
                   row_number() over (partition by person, cost_center order by period) as seqnum_2,
                   lag(cost_center) over (partition by person order by period) as immediate_prev_cost_center
            from t
           ) t;
    

    The difference of row numbers defines the group. The max() spreads the previous value throughout the group.

    Here is a db<>fiddle.