I'm handling a table like so:
Name | Status | Date |
---|---|---|
Alfred | 1 | Jan 1 2023 |
Alfred | 2 | Jan 2 2023 |
Alfred | 3 | Jan 2 2023 |
Alfred | 4 | Jan 3 2023 |
Bob | 1 | Jan 1 2023 |
Bob | 3 | Jan 2 2023 |
Carl | 1 | Jan 5 2023 |
Dan | 1 | Jan 8 2023 |
Dan | 2 | Jan 9 2023 |
I'm trying to setup a query so I can handle the following: I'd like to pull the most recent status per Name,
SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name
Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not
WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )
And then maybe joining the above on a left join on Name?
But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?
You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.
You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.
Join 2 Queries
Select A.*,Coalesce(B.Ever2,"No") as Ever2
From (
Select * From testable x
Where date=(Select max(date) From test_table y
Where x.name=y.name)
) A Left Outer Join (
Select name,"Yes" as Ever2 From test_table
Where status=2
Group By name
) B On A.name=B.name
The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.
use of Window Functions
Select * From (
Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
A.*,
Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
From test_table A
)
Where bestrow=1
This second query type always reads and sorts the entire test_table so it might not be the most efficient.