I want to get the same first/last not null value for all the records in a same window. I've prepared this example data:
WITH sample_data AS (
SELECT 1 AS id, NULL AS value, CURRENT_TIMESTAMP() AS update_ts
UNION ALL
SELECT 1, "Good", TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
UNION ALL
SELECT 1, "Bad", TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
)
SELECT
id,
value,
FIRST_VALUE(value IGNORE NULLS) OVER (ids) AS first_valid_value,
LAST_VALUE(value IGNORE NULLS) OVER (ids) AS last_valid_value,
update_ts
FROM sample_data
WINDOW ids AS (PARTITION BY id ORDER BY update_ts DESC)
This is the output for this query:
id | value | first_valid_value | last_valid_value | update_ts |
---|---|---|---|---|
1 | 2024-12-13 12:37:05.762489 UTC | |||
1 | Good | Good | Good | 2024-12-12 12:37:05.762489 UTC |
1 | Bad | Good | Bad | 2024-12-11 12:37:05.762489 UTC |
There are two things which are confusing me:
INGORE NULLS
part because I want to get a "valid" value, so I was expecting the first row (with NULL value
field), to inherit the "Good" value in the first_valid_value
field, and "Bad" for last_valid_value
last_valid_value
didn't inherited the "Bad" value, as the window function is definingThis is the output I want to achieve and which I was expecting:
id | value | first_valid_value | last_valid_value | update_ts |
---|---|---|---|---|
1 | Good | Bad | 2024-12-13 12:37:05.762489 UTC | |
1 | Good | Good | Bad | 2024-12-12 12:37:05.762489 UTC |
1 | Bad | Good | Bad | 2024-12-11 12:37:05.762489 UTC |
Why is this happening, and how could I achieve the final result?
You need to specify the window frame clause:
SELECT
id,
value,
FIRST_VALUE(value IGNORE NULLS) OVER (ids) AS first_valid_value,
LAST_VALUE(value IGNORE NULLS) OVER (ids) AS last_valid_value,
update_ts
FROM sample_data
WINDOW ids AS (PARTITION BY id ORDER BY update_ts DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
From the docs:
For aggregate analytic functions, if the
ORDER BY
clause is present but the window frame clause is not, the following window frame clause is used by default:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW