I am relatively new to PostgreSQL and am having a difficult time trying to implement the lag() function while ignoring null values. According to the docs, http://www.postgresql.org/docs/9.4/static/functions-window.html, it is not possible to simply pass a parameter to the function to achieve this.
Does anyone know of any workaround? You can make up any example but if it helps here is something to get you started:
Col_A Col_B Col_A_Lag Col_B_Lag ID Date
VA_1 100 VA_1 100 AA Jan-1
null null VA_1 100 AA Jan-2
null null VA_1 100 AA Jan-3
VA_2 250 VA_2 250 AA Jan-4
VA_2 300 VA_2 300 AA Jan-5
null null VA_2 300 AA Jan-6
If I am pulling from tbl_x, here is a simple SQL script:
select
Col_A,
Col_B,
case when Col_A is null then lag(Col_A) over w else Col_A end as Col_A_Lag,
case when Col_B is null then lag(Col_B) over w else Col_B end as Col_B_Lag
from tbl_x
window w as (partition by ID order by Date)
This script will not return what I want because it includes the null values when it 'looks back' in the lag.
Thanks in advance.
I suppose that you can't just:
select
from tbl_x
window w as (partition by ID order by Date)
where col_a is null;
If not then you might have to:
select
Col_A,
Col_B,
case when Col_A is null
then (select col_a
from tbl_x x2
where x2.id = x1.id
and col_a is not null
and x2.date < x1.date
order by date desc
limit 1)
else Col_A
end Col_A_Lag,
case when Col_B is null
then (select col_b
from tbl_x x2
where x2.id = x1.id
and col_b is not null
and x2.date < x1.date
order by date desc
limit 1)
else Col_B
end Col_B_Lag
from tbl_x x1;
With appropriate indexing the performance would probably be pretty good.
create index ... (id, date desc, col_b) where col_b is not null;