Search code examples
t-sqlgaps-and-islands

How to collapse overlapping date periods with acceptable gaps using T-SQL?


We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.

SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31' 

SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate 
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1

-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

Solution

  • I think you are really close. Your question is very similar to TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.

    My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.

    Here is my response to that question modified to your situation.

    SELECT
          a4.MemID
        , CONVERT (DATE, a4.First_EffDate) AS [EffDate]
        , CONVERT (DATE, a4.TermDate)      AS [TermDate]
    FROM  (
              SELECT
                   a3.MemID
                 , a3.EffDate
                 , a3.TermDate
                 , a3.MemID_group
                 , FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
                 , ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC)       AS [Row_number]
              FROM (
                       SELECT
                            a2.MemID
                          , a2.EffDate
                          , a2.TermDate
                          , a2.Previous_MemID
                          , a2.Previous_TermDate
                          , a2.New_group
                          , SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
                       FROM (
                                SELECT
                                     a1.MemID
                                   , a1.EffDate
                                   , a1.TermDate
                                   , a1.Previous_MemID
                                   , a1.Previous_TermDate
                                   ---------------------------------------------------------------------------------
                                   -- new group if the MemID is different from the previous row OR
                                   -- if the MemID is the same as the previous row AND it has been more than 45 days
                                   -- between the TermDate of the previous row and the EffDate of the current row
                                   , 
                                   IIF((a1.MemID <> a1.Previous_MemID)
                                      OR (
                                             a1.MemID = a1.Previous_MemID
                                         AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
                                         )
                                       , 1
                                       , 0) AS [New_group]
                                   ---------------------------------------------------------------------------------
                                FROM (
                                         SELECT
                                              MemID
                                            , EffDate
                                            , TermDate
                                            , LAG (MemID) OVER (ORDER BY MemID)                         AS [Previous_MemID]
                                            , LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
                                         FROM #T1
                                     ) a1
                            ) a2
                   ) a3
          ) a4
    WHERE a4.[Row_number] = 1;
    

    Here is the dbfiddle.