Search code examples
postgresqllag

PostgreSQL Window Functions Ignore Null Values


I am relatively new to PostgreSQL and am having a difficult time trying to implement the lag() function while ignoring null values. According to the docs, http://www.postgresql.org/docs/9.4/static/functions-window.html, it is not possible to simply pass a parameter to the function to achieve this.

Does anyone know of any workaround? You can make up any example but if it helps here is something to get you started:

Col_A   Col_B  Col_A_Lag  Col_B_Lag  ID  Date
VA_1    100    VA_1       100        AA  Jan-1 
null    null   VA_1       100        AA  Jan-2
null    null   VA_1       100        AA  Jan-3
VA_2    250    VA_2       250        AA  Jan-4
VA_2    300    VA_2       300        AA  Jan-5  
null    null   VA_2       300        AA  Jan-6

If I am pulling from tbl_x, here is a simple SQL script:

select
Col_A,
Col_B,
case when Col_A is null then lag(Col_A) over w else Col_A end as Col_A_Lag,
case when Col_B is null then lag(Col_B) over w else Col_B end as Col_B_Lag
from tbl_x
window w as (partition by ID order by Date)

This script will not return what I want because it includes the null values when it 'looks back' in the lag.

Thanks in advance.


Solution

  • I suppose that you can't just:

    select
    from tbl_x
    window w as (partition by ID order by Date)
    where col_a is null;
    

    If not then you might have to:

    select
    Col_A,
    Col_B,
    case when Col_A is null
      then (select col_a
              from tbl_x x2
             where x2.id = x1.id
               and col_a is not null
               and x2.date < x1.date
          order by date desc
             limit 1)
      else Col_A
      end Col_A_Lag,
    case when Col_B is null
      then (select col_b
              from tbl_x x2
             where x2.id = x1.id
               and col_b is not null
               and x2.date < x1.date
          order by date desc
             limit 1)
      else Col_B
      end Col_B_Lag
    from tbl_x x1;
    

    With appropriate indexing the performance would probably be pretty good.

    create index ... (id, date desc, col_b) where col_b is not null;