Search code examples
sql-servergroupingdate-rangegaps-and-islands

Relative 60-Day Groups in SQL Server


I am trying to create groups of 60-day date windows in MS SQL Server. The starting index point for each group would need to be relative to the "start" of a new 60-day period.

For example, Person A below has seven events with the start and end dates below.

Event #1 is in group 3/6/2018 (1/5/2018 + 60)
Event #2 is also in group 3/6/2018 because the event is within the first 60-day window (2/2/2018 <=3/6/2018)
Event #3 is in group 8/29/2018 (the start date of 6/5/2019 is outside of the first 60-day window, so a new 60-day window is starting (6/30/2018 + 60)

    A           B           C           D           E
1   Individual  Event       StartDate   EndDate     Groups
2   Person A    Event #1    1/1/2018    1/5/2018    3/6/2018
3   Person A    Event #2    2/2/2018    2/13/2018   3/6/2018
4   Person A    Event #3    6/5/2018    6/30/2018   8/29/2018
5   Person A    Event #4    9/5/2020    9/5/2020    11/4/2020
6   Person A    Event #5    3/3/2021    3/5/2021    5/4/2021
7   Person A    Event #7    4/1/2021    4/5/2021    5/4/2021

In Excel this could be accomplished by running the formulas below.

Cell E2 = D2 + 60 Cell E3 = IF(C3 <= E2,E2,D3+60) Cells E4:E7 = Dragging the formula from E3 down

Is there a formula in the server that could accomplish this grouping?

I tried running the below, but then run into the issue of the lag function 1) not being able to exclude NULL values and 2) not being able to replace the value found in the previous row as would happen in Excel.

WITH Part1 AS (
    SELECT t.* ,CASE  WHEN LAG(t.EndDate) OVER (PARTITION BY t.Individual ORDER BY t.EndDate) IS NULL THEN DATEADD(d,60,t.EndDate) ELSE NULL END AS Ref
    FROM Table t
), Part2 AS (
    SELECT p1.* ,CASE WHEN LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) IS NOT NULL THEN p1.Ref WHEN p1.StartDate <= LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) THEN     LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) ELSE DATEADD(d,60,p1.EndDate) END AS Groups
    FROM Part1 AS p1
)

EDIT:

I tried using the example in a different dataset. But, it doesn't seem to "reset" to a new "index" date when start dates are closer.


SELECT  *
,   MAX(DATEADD(DAY, 60, EndDate)) OVER(PARTITION BY Individual, grouping) AS GroupActual
FROM    (
    SELECT  *
    ,   SUM(matchingDays) OVER(PARTITION BY Individual ORDER BY StartDate) AS grouping
    FROM    (
        SELECT  *
        ,   CASE WHEN DATEDIFF(DAY, lag(enddate, 1, StartDate) OVER(PARTITION BY Individual ORDER BY StartDate), StartDate) < 60 THEN 0 ELSE 1 END AS matchingDays
        FROM    #t t
        ) x
    ) x
    A           B           C           D           E           F           
1   Individual  Event       StartDate   EndDate     Expected    Actual
2   Person A    Event #1    8/10/2020   8/13/2020   10/12/2020  10/12/2020
3   Person A    Event #2    10/16/2020  10/18/2020  12/17/2020  3/19/2021
4   Person A    Event #3    10/20/2020  10/23/2020  12/17/2020  3/19/2021
5   Person A    Event #4    12/20/2020  12/22/2020  2/20/2021   3/19/2021
6   Person A    Event #5    12/27/2020  12/28/2020  2/20/2021   3/19/2021
7   Person A    Event #6    1/16/2021   1/18/2021   2/20/2021   3/19/2021

Solution

  • Here's a possible solution:

    SELECT  Individual, Event
    ,   CAST(startdate AS DATE) AS StartDate
    ,   CAST(enddate AS DATE) AS EndDate
    ,   CAST(Groups AS DATE) AS Expected
    INTO    #t
    FROM (
        VALUES  (N'Person A','Event #1','1/1/2018','1/5/2018' ,'3/6/2018')
        ,   (N'Person A','Event #2','2/2/2018','2/13/2018','3/6/2018')
        ,   (N'Person A','Event #3','6/5/2018','6/30/2018','8/29/2018')
        ,   (N'Person A','Event #4','9/5/2020','9/5/2020' ,'11/4/2020')
        ,   (N'Person A','Event #5','3/3/2021','3/5/2021' ,'5/4/2021')
        ,   (N'Person A','Event #7','4/1/2021','4/5/2021' ,'5/4/2021')
    ) t (Individual,Event,StartDate,EndDate,Groups)
    
    select  *
    ,   DATEADD(DAY, 60, (
                    SELECT TOP 1 prev.Enddate
                    FROM #t prev
                    WHERE prev.Individual = t.Individual
                    AND DATEADD(DAY, 60, prev.enddate) > t.StartDate
                    ORDER BY prev.EndDate)
        ) AS GroupActual
    FROM    #t t
    

    First I convert strings to proper datatypes.

    Then, for every row, fetch the first EndDate which corresponds to the grouping logic, ie. within 60 days of the current row's StartDate. There will be always at least one match, either some prior date or against itself, which creates the new "group".

    Finally, i add 60 days to this date to create the grouping date.

    EDIT:

    Version that groups by 60 days since previous:

    
    SELECT  *
    ,   MAX(DATEADD(DAY, 60, EndDate)) OVER(PARTITION BY Individual, grouping) AS GroupActual
    FROM    (
        SELECT  *
        ,   SUM(matchingDays) OVER(PARTITION BY Individual ORDER BY StartDate) AS grouping
        FROM    (
            SELECT  *
            ,   CASE WHEN DATEDIFF(DAY, lag(enddate, 1, StartDate) OVER(PARTITION BY Individual ORDER BY StartDate), StartDate) < 60 THEN 0 ELSE 1 END AS matchingDays
            FROM    #t t
            ) x
        ) x
    

    This one uses LAG to get previous Enddate. If it's within 60 days, i don't increment a flag that says which group we're at. This allows us to create a rolling group counter. Every time we're above 60 days, we start a new group.

    Finally, you get the maximum EndDate for each group and adds 60 days to it. It's all pretty standard stuff