I intend to create a BQ query that would count the number of 'null' months, for each id, before a specified date (in this example the specified date is 2023-08-01 for all IDs; but I later wish to do this for cases where that date will differ per ID so if you have suggestions that would account for that I would greatly appreciate it)
The desired output would be this: e.g. before 2023-08-01 ID 111 had 3 null months, 222 had 5 null months etc
Tried checking these but its doesn't seem like the data is formatted in the same way
Closest I think I got was this - but the desired output is not exactly what I need (e.g. I don't have the option to mention the exact date from where I want the preceeding null rows to be counted from)
Several CTE are needed. The tbl
is your raw data of the example.
The helper
counts the null
entries which happen in the future to this id
and gives this value in column count_na
. The helper
looks up the count_na
for the next non null item in nr
. Next a subtraction between these values is made.
You only want to display rows which have a non null nr
. The qualify
clause filters only the latest entry.
Please comment out the last two filter lines. Also query each CTE for a better understanding.
With tbl as (
Select *, case row_number() over (order by id, month) when 3 then 132 when 4 then 89 when 8 then 300 when 10 then 114 when 16 then 334 end as nr
from unnest(['111','222'])as id, unnest(generate_date_array(date "2023-01-01",date "2023-08-01",interval 1 month)) as month
)
, helper as (
Select *,
countif(nr is null) over fill count_na
from tbl
window fill as (partition by id order by month desc rows between unbounded preceding and 0 preceding)
order by 1,2
), helper2 as
(
Select *,
last_value(if(nr is null,null,count_na) ignore nulls) over fill as last_count
from helper
window fill as (partition by id order by month desc rows between unbounded preceding and 1 preceding)
order by 1,2
)
Select *,
count_na - last_count as nr_null_months
from helper2
where nr is not null and last_count is not null # comment this out
qualify month=max(month) over (partition by id) # comment this out
order by 1,2