Search code examples
sqlsnowflake-cloud-data-platformgreatest-n-per-groupwindow-functions

How to use row_number() over partition by to keep rows where value is not null if it occurs first


I have a table like:

ID LOAD_DATE START_VAL END_VAL
1 2023-06-01 NULL 7
1 2023-06-03 NULL 5
1 2023-06-03 5 7
2 2023-06-08 NULL 2
2 2023-06-09 NULL 3
2 2023-06-10 NULL 4

I am trying to get the latest record per ID. However I would like to keep the latest record per id where START_VAL and END_VAL are not NULL if there are multiple records for the latest date. For example for ID 1, there are two records for the date 2023-06-03 but I would like to prioritize the record where START_VAL and END_VAL are both not NULL if such a record exists.

i tried a query like:

SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) = 1;

However this returns a table like

ID LOAD_DATE START_VAL END_VAL
1 2023-06-03 NULL 5
2 2023-06-10 NULL 4

so for ID 1 the record where both START_VAL and END_VAL are both not NULL are not prioritized in the qualify statement. Is there a way to define this sort of logic in the partition?


Solution

  • It is just about tuning the order by clause of your window function; you can add another level of sorting, that prioritizes rows where both val columns are not null:

    select *
    from my_table
    qualify row_number() over (
        partition by id 
        order by load_date desc, 
                 case when start_val is not null and end_val is not null then 0 else 1 end
    ) = 1;
    

    Note that, even with this technique, there might still be ties (meaning more than one row of the same id that have the latest date and two non-null values) - in which case it is actually undefined which record will be picked. I would recommend adding at least one more level to avoid that ; eg. if you have a primary key column, say pk:

    select *
    from my_table
    qualify row_number() over (
        partition by id 
        order by load_date desc, 
                 case when start_val is not null and end_val is not null then 0 else 1 end,
                 pk
    ) = 1;