Search code examples
sqlsnowflake-cloud-data-platformaggregate-functionswindow-functions

Checking conditions per group, and ranking most recent row?


I'm handling a table like so:

Name Status Date
Alfred 1 Jan 1 2023
Alfred 2 Jan 2 2023
Alfred 3 Jan 2 2023
Alfred 4 Jan 3 2023
Bob 1 Jan 1 2023
Bob 3 Jan 2 2023
Carl 1 Jan 5 2023
Dan 1 Jan 8 2023
Dan 2 Jan 9 2023

I'm trying to setup a query so I can handle the following: I'd like to pull the most recent status per Name,

SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name

Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not

WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )

And then maybe joining the above on a left join on Name?

But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?


Solution

  • You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.

    You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.

    Join 2 Queries

    Select A.*,Coalesce(B.Ever2,"No") as Ever2
    From (
        Select * From testable x
        Where date=(Select max(date) From test_table y
                    Where x.name=y.name)
    ) A Left Outer Join (
        Select name,"Yes" as Ever2 From test_table
        Where status=2
        Group By name
    ) B On A.name=B.name
    

    The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.

    use of Window Functions

    Select * From (
        Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
        A.*,
        Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
        From test_table A
    )
    Where bestrow=1
    

    This second query type always reads and sorts the entire test_table so it might not be the most efficient.