Search code examples
apache-sparkhivehql

Filling in NULLS with previous records - Netezza SQL


I am using Netezza SQL on Aginity Workbench and have the following data:

id           DATE1              DATE2
1            2013-07-27         NULL
2            NULL               NULL
3            NULL               2013-08-02
4            2013-09-10         2013-09-23
5            2013-12-11         NULL
6            NULL               2013-12-19

I need to fill in all the NULL values in DATE1 with preceding values in the DATE1 field that are filled in. With DATE2, I need to do the same, but in reverse order. So my desired output would be the following:

id           DATE1              DATE2
1            2013-07-27         2013-08-02
2            2013-07-27         2013-08-02
3            2013-07-27         2013-08-02
4            2013-09-10         2013-09-23
5            2013-12-11         2013-12-19
6            2013-12-11         2013-12-19

I only have read access to the data. So creating Tables or views are out of the question


Solution

  • How about this?

    select
      id
      ,last_value(date1 ignore nulls) over (
        order by id
        rows between unbounded preceding and current row
      ) date1
      ,first_value(date2 ignore nulls) over (
        order by id
        rows between current row and unbounded following
      ) date2
    

    You can manually calculate this as well, rather than relying on the windowing functions.

    with chain as (
      select 
        this.*,
        prev.date1 prev_date1,
        case when prev.date1 is not null then abs(this.id - prev.id) else null end prev_distance,
        next.date2 next_date2,
        case when next.date2 is not null then abs(this.id - next.id) else null end next_distance
      from 
        Table1 this 
        left outer join Table1 prev on this.id >= prev.id
        left outer join Table1 next on this.id <= next.id
    ), min_distance as (
      select
        id,
        min(prev_distance) min_prev_distance,
        min(next_distance) min_next_distance
      from
        chain
      group by
        id
    )
    select
      chain.id,
      chain.prev_date1,
      chain.next_date2
    from
      chain
      join min_distance on 
        min_distance.id = chain.id
        and chain.prev_distance = min_distance.min_prev_distance
        and chain.next_distance = min_distance.min_next_distance
    order by chain.id
    

    If you're unable to calculate the distance between IDs by subtraction, just replace the ordering scheme by a row_number() call.