Search code examples
sqlgoogle-bigquerynullsnapshotcoalesce

SQL fill NULL row with previous non NULL row


So I have a bq table that was created as several calender date snapshot, joined to trx data. Please find below query to populate the table

  SELECT
    GENERATE_DATE_ARRAY(date_add(DATE(CURRENT_TIMESTAMP), interval -20 day), DATE('2020-08-22')) AS date_array
  )
         
 ,dim_date AS (
  SELECT
    sn_date
  FROM
    date_array_table,
    UNNEST(date_array) AS sn_date
    )
  
 ,data_test as (
 select date('2020-08-20') as date, 1 as id, 1000 as num
 UNION ALL
 select date('2020-08-18') as date, 1 as id, 130 as num
 UNION ALL
 select date('2020-08-18') as date, 2 as id, 300 as num
 UNION ALL
 select date('2020-08-13') as date, 2 as id, 250 as num
 )

 ,jjoin as (
 select
 *
 from dim_date
 left join 
 data_test
 on 1=1 and sn_date = date
 )

 select *
 from jjoin
 order by 1 desc

The result is as following img enter image description here

next I would like to fill the snapshot row with the NULL values with previous non NULL row by date for each id. I have tried to use max or first_value but it is still NULL. example :

select sn_date
coalesce(num, max (num) over (partition by id order by date)
from jjoin

but it doesn't show previous non Null row. any advice? thanks

expected :

--------------------------
sn_date | date | id | num
--------------------------
08/22   | 08/20| 1  | 1000
08/21   | 08/20| 1  | 1000
08/20   | 08/20| 1  | 1000
08/19   | 08/18| 1  | 130
08/18   | 08/18| 1  | 130
08/18   | 08/18| 2  | 300
08/17   | 08/13| 1  | 250
08/16   | 08/13| 1  | 250
08/15   | 08/13| 1  | 250

Solution

  • You can use last_value():

    select sn_date, date, id, num,
           last_value(date ignore nulls) over (order by date desc),
           last_value(id ignore nulls) over (order by date desc),
           last_value(num ignore nulls) over (order by date desc)
    

    I should note that the SQL standard supports ignore nulls on lag() as well as first_value() and last_value(). When I think about solving this problem, I think in terms of lag(). I think BigQuery is the only database that supports ignore nulls but not on lag().