Search code examples
sqlamazon-redshiftlagwindow-functions

lag to get first non null value since the previous null value


Below is an example of what I'm trying to achieve in a Redshift Database.

I have a variable current_value and I want to create a new column value_desired that is:

  • the same as current_value if the previous row is null
  • equal to the last preceding non-null value if the previous row is non-null

It sounds like an easy task but I haven't found a way to do it yet.

row_numb     current_value   value_desired
1
2
3            47              47
4
5            45              45
6
7
8            42              42
9            41              42
10           40              42
11           39              42
12           38              42
13
14           36              36
15
16
17           33              33
18           32              33

I've tried with the LAG() function but I can only get the previous value (not the first in the "non-null" block), here is my take:

SELECT *
    , CASE WHEN current_value is not null and LAG(current_value) is null THEN current_value
           WHEN current_value is not null and LAG(current_value) is not null 
            THEN LAG(current_value)
      ELSE NULL END AS value_desired
  FROM test1

Any help is much appreciated, thanks.


Solution

  • Here is the correct answer, which gives the right results. There are a few clever tricks here, i suggest you take a careful look through and let me know what needs clarifications

    create test data, as per your question.

    drop table if exists test_table ;
    create table test_table (row_num int,current_value int);
    insert into test_table(row_num, current_value)
    values
      (1,null),
    (2,null),
    (3,47),
    (4,null),
    (5,45),
    (6,null),
    (7,null),
    (8 ,42),
    (9 ,41),
    (10,40  ),
    (11,39 ),
    (12,38 ),
    (13,null),
    (14,36),
    (15,null),
    (16,null),
    (17 ,33),
    (18,32  )
    ;
    

    Then run this code

    SELECT DISTINCT
      j1.row_num,
      CASE WHEN j1.current_value IS NULL
        THEN NULL
      ELSE
        last_value(j2.current_value)
        OVER (
          PARTITION BY j1.row_num
          ORDER BY j2.row_num
          ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) END AS value_desired
    FROM test_table AS j1
      LEFT JOIN (SELECT
                   row_num,
                   current_value,
                   lag(current_value, 1)
                   OVER (
                     ORDER BY row_num ) AS prev_cval
                 FROM test_table) AS j2
        ON j1.row_num >= j2.row_num AND j2.current_value IS NOT NULL
           AND j2.prev_cval IS NULL
    ORDER BY j1.row_num;