I am trying to create groups of 60-day date windows in MS SQL Server. The starting index point for each group would need to be relative to the "start" of a new 60-day period.
For example, Person A below has seven events with the start and end dates below.
Event #1 is in group 3/6/2018 (1/5/2018 + 60)
Event #2 is also in group 3/6/2018 because the event is within the first 60-day window (2/2/2018 <=3/6/2018)
Event #3 is in group 8/29/2018 (the start date of 6/5/2019 is outside of the first 60-day window, so a new 60-day window is starting (6/30/2018 + 60)
A B C D E
1 Individual Event StartDate EndDate Groups
2 Person A Event #1 1/1/2018 1/5/2018 3/6/2018
3 Person A Event #2 2/2/2018 2/13/2018 3/6/2018
4 Person A Event #3 6/5/2018 6/30/2018 8/29/2018
5 Person A Event #4 9/5/2020 9/5/2020 11/4/2020
6 Person A Event #5 3/3/2021 3/5/2021 5/4/2021
7 Person A Event #7 4/1/2021 4/5/2021 5/4/2021
In Excel this could be accomplished by running the formulas below.
Cell E2 = D2 + 60 Cell E3 = IF(C3 <= E2,E2,D3+60) Cells E4:E7 = Dragging the formula from E3 down
Is there a formula in the server that could accomplish this grouping?
I tried running the below, but then run into the issue of the lag function 1) not being able to exclude NULL values and 2) not being able to replace the value found in the previous row as would happen in Excel.
WITH Part1 AS (
SELECT t.* ,CASE WHEN LAG(t.EndDate) OVER (PARTITION BY t.Individual ORDER BY t.EndDate) IS NULL THEN DATEADD(d,60,t.EndDate) ELSE NULL END AS Ref
FROM Table t
), Part2 AS (
SELECT p1.* ,CASE WHEN LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) IS NOT NULL THEN p1.Ref WHEN p1.StartDate <= LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) THEN LAG(p1.Ref) OVER(PARTITION BY p1.Individual ORDER BY (t.EndDate) ELSE DATEADD(d,60,p1.EndDate) END AS Groups
FROM Part1 AS p1
)
EDIT:
I tried using the example in a different dataset. But, it doesn't seem to "reset" to a new "index" date when start dates are closer.
SELECT *
, MAX(DATEADD(DAY, 60, EndDate)) OVER(PARTITION BY Individual, grouping) AS GroupActual
FROM (
SELECT *
, SUM(matchingDays) OVER(PARTITION BY Individual ORDER BY StartDate) AS grouping
FROM (
SELECT *
, CASE WHEN DATEDIFF(DAY, lag(enddate, 1, StartDate) OVER(PARTITION BY Individual ORDER BY StartDate), StartDate) < 60 THEN 0 ELSE 1 END AS matchingDays
FROM #t t
) x
) x
A B C D E F
1 Individual Event StartDate EndDate Expected Actual
2 Person A Event #1 8/10/2020 8/13/2020 10/12/2020 10/12/2020
3 Person A Event #2 10/16/2020 10/18/2020 12/17/2020 3/19/2021
4 Person A Event #3 10/20/2020 10/23/2020 12/17/2020 3/19/2021
5 Person A Event #4 12/20/2020 12/22/2020 2/20/2021 3/19/2021
6 Person A Event #5 12/27/2020 12/28/2020 2/20/2021 3/19/2021
7 Person A Event #6 1/16/2021 1/18/2021 2/20/2021 3/19/2021
Here's a possible solution:
SELECT Individual, Event
, CAST(startdate AS DATE) AS StartDate
, CAST(enddate AS DATE) AS EndDate
, CAST(Groups AS DATE) AS Expected
INTO #t
FROM (
VALUES (N'Person A','Event #1','1/1/2018','1/5/2018' ,'3/6/2018')
, (N'Person A','Event #2','2/2/2018','2/13/2018','3/6/2018')
, (N'Person A','Event #3','6/5/2018','6/30/2018','8/29/2018')
, (N'Person A','Event #4','9/5/2020','9/5/2020' ,'11/4/2020')
, (N'Person A','Event #5','3/3/2021','3/5/2021' ,'5/4/2021')
, (N'Person A','Event #7','4/1/2021','4/5/2021' ,'5/4/2021')
) t (Individual,Event,StartDate,EndDate,Groups)
select *
, DATEADD(DAY, 60, (
SELECT TOP 1 prev.Enddate
FROM #t prev
WHERE prev.Individual = t.Individual
AND DATEADD(DAY, 60, prev.enddate) > t.StartDate
ORDER BY prev.EndDate)
) AS GroupActual
FROM #t t
First I convert strings to proper datatypes.
Then, for every row, fetch the first EndDate which corresponds to the grouping logic, ie. within 60 days of the current row's StartDate. There will be always at least one match, either some prior date or against itself, which creates the new "group".
Finally, i add 60 days to this date to create the grouping date.
EDIT:
Version that groups by 60 days since previous:
SELECT *
, MAX(DATEADD(DAY, 60, EndDate)) OVER(PARTITION BY Individual, grouping) AS GroupActual
FROM (
SELECT *
, SUM(matchingDays) OVER(PARTITION BY Individual ORDER BY StartDate) AS grouping
FROM (
SELECT *
, CASE WHEN DATEDIFF(DAY, lag(enddate, 1, StartDate) OVER(PARTITION BY Individual ORDER BY StartDate), StartDate) < 60 THEN 0 ELSE 1 END AS matchingDays
FROM #t t
) x
) x
This one uses LAG to get previous Enddate. If it's within 60 days, i don't increment a flag that says which group we're at. This allows us to create a rolling group counter. Every time we're above 60 days, we start a new group.
Finally, you get the maximum EndDate for each group and adds 60 days to it. It's all pretty standard stuff