Search code examples
sqlpostgresqlamazon-redshiftwindow-functions

How can I get the Redshift/Postgresql LAG window function to selectively exclude records?


I have this table in Redshift, and I'm trying to write a query for the following dataset. For those items such as row#3 which are 'renewal successes' and are preceded by a 'sub success', I want to flag them as 'is_first_renewal = true', BUT they might have been preceded by any number of 'RENEWAL Failures' before they succeeded, so I can't use the window function LAG for this scenario.

I also cannot filter out FAILURES as my query needs those.

id  phone   op      ts              pr  status   result is_first_renewal
1   3345    xxxx    01/01/2018 6:59 0   SUB     SUCCESS 
2   3345    xxxx    28/02/2018 6:59     RENEWAL FAILURE 
3   3345    xxxx    01/03/2018 6:59 20  RENEWAL SUCCESS TRUE
4   3345    xxxx    02/04/2018 6:59 20  RENEWAL SUCCESS FALSE

My current query is this:

CASE
    WHEN (status = 'RENEWAL' AND result = 'SUCCESS')
         AND ( (last_value(CASE
                               WHEN pr = 0.0 AND result= 'SUCCESS' THEN
                                 TRUE
                           END)) 
                OVER (PARTITION BY phone
                ORDER BY ts ASC 
                ROWS unbounded preceding) = 'SUB' ) THEN
      TRUE
    ELSE 
      FALSE
    END AS is_first_renewal
FROM notifications
WHERE ((charge_status = 'SUCCESS')
       OR (status ='RENEWAL'
           AND result = 'FAILED'))
ORDER BY
         op,
         phone,
         ts;

Thus, the objective is to find a way to get the LAG function to ignore those 'FAILURE' items... until it can find the preceding 'sub success' item. Any suggestions? Thanks in advance.


Solution

  • The approach which came to mind for me was to instead target the individual records which should be labelled as TRUE. Consider this:

    WITH cte AS (
        SELECT *,
            ROW_NUMBER() OVER (PARTITION BY phone ORDER BY ts) rn
        FROM notifications n1
        WHERE EXISTS (SELECT 1 FROM notifications n2 WHERE n1.phone = n2.phone AND
                      n2.ts < n1.ts AND n2.status = 'SUB' AND n2.result = 'SUCCESS') AND
              n1.status = 'RENEWAL' AND n1.result = 'SUCCESS'
    )
    
    SELECT n1.*,
        CASE WHEN n2.rn = 1 THEN 'TRUE'
             WHEN n2.rn > 1 THEN 'FALSE' END AS is_first_renewal
    FROM notifications n1
    LEFT JOIN cte n2
        ON n1.phone = n2.phone AND n1.ts = n2.ts;
    

    enter image description here

    This query seems to be working in the Postgres demo link given below.

    Demo