Search code examples
sql-servert-sql

Retrieve only the records before a specific patterns happen


In the dataset I have many Case ID’s all of them that goes through some statuses, the process is not fixed, but at a point in time the Case's should be Closed in the CLSD status.

Data

For the two CaseId examples I would like to get (for each) the rows that is before including CLSD row. For instance, CaseId 789101 I would like to get all the rows except NCMP and PLIP since they have a higher FromDateTime and / or EventNumber than CLSD row. There is cases that has now reach CLSD status yet and those I want all the rows from.
ToDateTimeWithNull column with NULL values represent the current status of the CaseId.


Solution

  • This can be solved with EXISTS selection:

    WITH yourdataset AS (
    SELECT  *
    FROM
    (
        VALUES  (N'123456', N'INIT', N'2024-01-20 08:44:10.000', N'2024-01-22 12:04:24.000', N'2024-01-22 12:04:24.000', 1)
        ,   (N'123456', N'CANC', N'2024-01-22 12:04:24.000', N'2024-01-22 12:04:24.000', N'2024-01-22 12:04:24.000', 2)
        ,   (N'123456', N'CLSD', N'2024-01-22 12:04:24.000', N'2024-01-22 12:04:24.000', N'2024-01-22 12:04:24.000', 3)
        ,   (N'123456', N'NCMP', N'2024-01-22 12:04:24.000', N'2024-05-21 11:51:24.017', NULL, N'4')
        ,   (N'789101', N'INIT', N'2023-01-02 08:22:21.000', N'2023-03-19 15:07:14.000', N'2023-03-19 15:07:14.000', 1)
        ,   (N'789101', N'AWAP', N'2023-03-19 15:07:14.000', N'2023-03-19 15:07:35.000', N'2023-03-19 15:07:35.000', 2)
        ,   (N'789101', N'RICT', N'2023-03-19 15:07:35.000', N'2023-03-19 15:15:31.000', N'2023-03-19 15:15:31.000', 3)
        ,   (N'789101', N'CANC', N'2023-03-19 15:15:31.000', N'2023-03-20 05:00:28.000', N'2023-03-20 05:00:28.000', 4)
        ,   (N'789101', N'CLSD', N'2023-03-20 05:00:28.000', N'2023-03-20 05:00:28.000', N'2023-03-20 05:00:28.000', 5)
        ,   (N'789101', N'NCMP', N'2023-03-20 05:00:28.000', N'2024-05-13 08:22:35.000', N'2024-05-13 08:22:35.000', 6)
        ,   (N'789101', N'PLIP', N'2024-05-13 08:22:35.000', N'2024-05-21 11:51:24.017', NULL, N'7')
    ) t (caseid, activityname, fromdatetime, todatetime, todatetimewithnull, eventnumber)
    )
    SELECT  * FROM  yourdataset d
    WHERE   EXISTS(
            SELECT  1
            FROM    yourdataset d2
            WHERE   d2.caseID = d.CaseID
            AND d2.ActivityName = 'CLSD'
            AND d2.EventNumber >= d.EventNumber
        )
    

    You look "forward" for each caseID until you hit the condition you want, in this case ActivityName='CLSD'.

    I used EventNumber as sequence value, but if your index is on CaseID,FromDateTime, you can use: d2.FromDateTime >= d.FromDateTime

    This works well, unless you have multiple CLSD events.