So I have a bq table that was created as several calender date snapshot, joined to trx data. Please find below query to populate the table
SELECT
GENERATE_DATE_ARRAY(date_add(DATE(CURRENT_TIMESTAMP), interval -20 day), DATE('2020-08-22')) AS date_array
)
,dim_date AS (
SELECT
sn_date
FROM
date_array_table,
UNNEST(date_array) AS sn_date
)
,data_test as (
select date('2020-08-20') as date, 1 as id, 1000 as num
UNION ALL
select date('2020-08-18') as date, 1 as id, 130 as num
UNION ALL
select date('2020-08-18') as date, 2 as id, 300 as num
UNION ALL
select date('2020-08-13') as date, 2 as id, 250 as num
)
,jjoin as (
select
*
from dim_date
left join
data_test
on 1=1 and sn_date = date
)
select *
from jjoin
order by 1 desc
The result is as following img
next I would like to fill the snapshot row with the NULL values with previous non NULL row by date for each id. I have tried to use max or first_value but it is still NULL. example :
select sn_date
coalesce(num, max (num) over (partition by id order by date)
from jjoin
but it doesn't show previous non Null row. any advice? thanks
expected :
--------------------------
sn_date | date | id | num
--------------------------
08/22 | 08/20| 1 | 1000
08/21 | 08/20| 1 | 1000
08/20 | 08/20| 1 | 1000
08/19 | 08/18| 1 | 130
08/18 | 08/18| 1 | 130
08/18 | 08/18| 2 | 300
08/17 | 08/13| 1 | 250
08/16 | 08/13| 1 | 250
08/15 | 08/13| 1 | 250
You can use last_value()
:
select sn_date, date, id, num,
last_value(date ignore nulls) over (order by date desc),
last_value(id ignore nulls) over (order by date desc),
last_value(num ignore nulls) over (order by date desc)
I should note that the SQL standard supports ignore nulls
on lag()
as well as first_value()
and last_value()
. When I think about solving this problem, I think in terms of lag()
. I think BigQuery is the only database that supports ignore null
s but not on lag()
.