I have a table like:
ID | LOAD_DATE | START_VAL | END_VAL |
---|---|---|---|
1 | 2023-06-01 | NULL | 7 |
1 | 2023-06-03 | NULL | 5 |
1 | 2023-06-03 | 5 | 7 |
2 | 2023-06-08 | NULL | 2 |
2 | 2023-06-09 | NULL | 3 |
2 | 2023-06-10 | NULL | 4 |
I am trying to get the latest record per ID. However I would like to keep the latest record per id where START_VAL and END_VAL are not NULL if there are multiple records for the latest date. For example for ID 1, there are two records for the date 2023-06-03 but I would like to prioritize the record where START_VAL and END_VAL are both not NULL if such a record exists.
i tried a query like:
SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) = 1;
However this returns a table like
ID | LOAD_DATE | START_VAL | END_VAL |
---|---|---|---|
1 | 2023-06-03 | NULL | 5 |
2 | 2023-06-10 | NULL | 4 |
so for ID 1 the record where both START_VAL and END_VAL are both not NULL are not prioritized in the qualify statement. Is there a way to define this sort of logic in the partition?
It is just about tuning the order by
clause of your window function; you can add another level of sorting, that prioritizes rows where both val
columns are not null
:
select *
from my_table
qualify row_number() over (
partition by id
order by load_date desc,
case when start_val is not null and end_val is not null then 0 else 1 end
) = 1;
Note that, even with this technique, there might still be ties (meaning more than one row of the same id
that have the latest date and two non-null values) - in which case it is actually undefined which record will be picked. I would recommend adding at least one more level to avoid that ; eg. if you have a primary key column, say pk
:
select *
from my_table
qualify row_number() over (
partition by id
order by load_date desc,
case when start_val is not null and end_val is not null then 0 else 1 end,
pk
) = 1;