Search code examples
sqloraclemissing-datawindow-functionslag

Oracle SQL, fill missing value with the closest non-missing


I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me.

Table:

create table Tab1(data date, V1 number);
insert into Tab1 values (date '2000-01-01', 1);
insert into Tab1 values (date '2000-02-01', 1);
insert into Tab1 values (date '2000-03-01', 1);
insert into Tab1 values (date '2000-04-01', 1);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
insert into Tab1 values (date '2000-03-01', 2);
insert into Tab1 values (date '2000-04-01', 2);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
select * from Tab1;

DATA       V1
2000-01-01  1
2000-02-01  1
2000-03-01  1
2000-04-01  1
2000-05-01  
2000-06-01  
2000-03-01  2
2000-04-01  2
2000-05-01  
2000-06-01  

Attempt #1:

select A.*, 
    (case when V1 is null then lag(V1 ignore nulls)  
               over (partition by V1 order by V1, data) 
          else V1 
          end) V2
 from Tab1 A;

Attempt #2:

select A.*, 
    (case when V1 is null 
               then last_value(V1 ignore nulls)
               over (partition by V1 order by data 
               range between unbounded preceding and 1 preceding)  
          else V1 
          end) V2
 from Tab1 A;

Both give me the same unwanted result:

DATA       V1   V2
2000-01-01  1   1
2000-02-01  1   1
2000-03-01  1   1
2000-04-01  1   1
2000-03-01  2   2
2000-04-01  2   2
2000-05-01      
2000-05-01      
2000-06-01      

What am I doing wrong?


Solution

  • Your first version should work, with a slight tweak:

    select A.*, 
           coalesce(V1, lag(V1 ignore nulls)  over (order by data)) V2
    from Tab1 A;
    

    The tweak is to remove the partition by v1 from the lag(). The coalesce() is just my preference for simpler expressions.

    The same tweak should work for the second version as well.

    Your version doesn't work because the lag() value must come from the same partition (or be null). When you have partition by v1, you are actually ensuring that v1 has the same value as in the current row.