Search code examples
sql-servert-sqlwindow-functions

Select first occurrence of a value per group with condition based on the most recent record of group


I currently have a credit risk default table where default status of parties are stored. I have a business requirement to get the earliest date where default has occurred and where the most recent record has not changed status to Non-Default. I have provided some sample data to illustrate the scenario.

--Create table 
CREATE TABLE #CreditDefaults
(Party_Id nvarchar(10) null,
Party_Data_Source_Code nvarchar(3) null,
Credit_Risk_Status_Type_Group   nvarchar(10) null,
Credit_Risk_Status_Type  nvarchar(15) null,
Effective_Timestamp datetime null
)

--add some dummy data

INSERT INTO #CreditDefaults 
(Party_Id,
Party_Data_Source_Code,
Credit_Risk_Status_Type_Group,
Credit_Risk_Status_Type,
Effective_Timestamp
)

VALUES
 ('60005400',   'MDS',  'Default',  'Default',      '2014-12-01 00:10:00.000')
,('60005400',   'MDS',  'Default',  'Default',      '2021-06-04 00:00:01.000')
,('60054040',   'MDS',  'Default',  'Default',      '2014-07-14 00:10:00.000')
,('60054040',   'MDS',  'Default',  'Default',      '2014-07-14 17:36:25.000')
,('60054040',   'MDS',  'Default',  'Default',      '2021-04-15 00:00:01.000')
,('60005495',   'MDS',  'Default',  'Default',      '2019-02-17 00:00:00.000')
,('60005495',   'MDS',  'Default',  'Non-Default',  '2022-03-25 00:00:00.000')
,('60005660',   'MDS',  'Default',  'Default',      '2022-05-24 13:27:47.000')
,('60005672',   'MDS',  'Default',  'Non-Default',  '2020-06-26 00:00:00.000')

SELECT * FROM #CreditDefaults 

For the data above I just need to select the following records:

60005400    MDS Default Default 2014-12-01 00:10:00.000
60054040    MDS Default Default 2021-04-15 00:00:01.000
60005660    MDS Default Default 2022-05-24 13:27:47.000

Any help would be much appreciated


Solution

  • Depending on the expected answer for the Default > Non-Default > Default scenario, the following might be the solution.

    WITH cte AS
    (
       SELECT *,
             ROW_NUMBER()
                 OVER (PARTITION BY Party_Id, Credit_Risk_Status_Type
                       ORDER BY Effective_Timestamp)
                 AS rn,
             LAST_VALUE(Credit_Risk_Status_Type)
                 OVER (PARTITION BY Party_Id
                       ORDER BY Effective_Timestamp
                       ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
                 AS LastStatus
       FROM #CreditDefaults 
    )
    SELECT *
    FROM cte
    WHERE rn = 1
    AND Credit_Risk_Status_Type = 'Default'
    AND LastStatus <> 'Non-Default'
    ORDER BY Party_Id, Effective_Timestamp
    

    The LAST_VALUE() window function is used to identify the latest status value, while ROW_NUMBER() is used to select the earliest row for each status. Filtering for (1) status = 'Default', (2) row number = 1, and (3) latest-status <> 'Non-Default' gives us the desired results.

    See this db<>fiddle for a demo with some extra test data.