sql where-clause snowflake-cloud-data-platform window-functions

Filter by partition in Snowflake

I want to filter out records where there is a record with a start_time before than start_time of created status for each id.

e.g. below id A has 'failed' status before 'created' based on start_time. So it needs to be filtered. Whereas id B has first 'created', then another status which is acceptable.

So the expected result is only this, but I'm looking for a scalable solution that works for thousands of rows.

WITH t1 AS (
SELECT 'A' AS id, 'failed' AS status, '2021-05-18 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 11:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 12:00:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-05-19 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-05-20 11:30:00'::timestamp AS start_time
    )
SELECT *
FROM t1

Solution

There are multiple ways to achieve this but here is an approach using first_value.

with t1 (id, status, start_time) as 
(select 'a', 'failed', '2021-05-18 18:30:00'::timestamp union all
 select 'a', 'created', '2021-05-24 11:30:00'::timestamp union all
 select 'a', 'created', '2021-05-24 12:00:00'::timestamp union all
 select 'b', 'created', '2021-05-19 18:30:00'::timestamp union all
 select 'b', 'successful', '2021-05-20 11:30:00'::timestamp)

select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'

All you're doing is making sure that the first status for any given id is "created". Think of qualify clause as having clause for window functions. You can also break this down into a subquery if you find that more readable.

Note: Solution above will also keep records that have only "created" status. If you want to make sure there are least two different statuses per id, modify it to

select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
        and 
        count(distinct status) over (partition by id) > 1;