Search code examples
sql-serverpartition-by

Creating Groups based on if Date Period Connects with Previous Row


I have test data like the following:

DROP TABLE IF EXISTS #test;
CREATE TABLE #test (id INT, start_date DATE,  end_date DATE)
INSERT INTO
    #test
VALUES 
    (1, '2023-01-01', '2024-01-01'),
    (1, '2023-05-01', '2024-07-01'),
    (1, '2025-01-01', '2026-01-01');

SELECT * 
FROM #test;

I am trying to create integer groups that check if the prior row connects with the current. If it doesn't connect, I want to increment the integer to notate that its a separate group.
I have tried below to check if the current row connects with the prior based on if the current period start_date comes before the prior periods end date. this would signal that the time periods connect and be treated as the same group.

WITH connects AS (
    SELECT
        id,
        CASE
            WHEN LAG(id, 1) OVER (PARTITION BY id ORDER BY start_date) = id AND (
                start_date <= LAG(end_date, 1) OVER (PARTITION BY id ORDER BY start_date)
                OR
                end_date <= LAG(end_date, 1) OVER (PARTITION BY id ORDER BY start_date)
            ) THEN 1
            WHEN LAG(id, 1) OVER (PARTITION BY id ORDER BY start_date) IS NULL THEN 1
            ELSE 0
        END AS connects_flag
    FROM
        #test
)
SELECT
    *
FROM
    connects;

Desired outcome:

id start_date end_date grp
1 2023-01-01 2024-01-01 1
1 2023-05-01 2024-07-01 1
1 2025-01-01 2026-01-01 2

Solution

  • Here's one way to do it to with cumulative aggregation:
    In the common table expression I've used a combination of lag and iif to create a column that will contain 1 whenever the previous row's end date is either missing or smaller than the current row's start date, and 0 otherwise (so 0 when the previous row overlaps with the current, 1 if it doesn't).
    Then all that's left to do is to commulatively sum the values of that column.
    I've also added some more rows to your sample table:

    DROP TABLE IF EXISTS #test;
    CREATE TABLE #test (id INT, start_date DATE,  end_date DATE)
    INSERT INTO
        #test
    VALUES 
        (1, '2023-01-01', '2024-01-01'),
        (1, '2023-05-01', '2024-07-01'),
        (1, '2024-05-01', '2024-07-01'),
        (1, '2024-08-01', '2024-09-01'),
        (1, '2025-01-01', '2026-01-01'),
        (1, '2026-05-01', '2026-07-01');
        
    
    WITH CTE AS
    (
        SELECT id
             , start_date
             , end_date
             , IIF(
                 ISNULL(
                     LAG(end_date) OVER(PARTITION BY id ORDER BY start_date)
                 , '0001-01-01') < start_date
                 , 1, 0) As grp
        FROM #test
    )
    
    SELECT id
         , start_date
         , end_date
         , SUM(grp) OVER(PARTITION BY id ORDER BY start_date) DateGroup
    FROM CTE;
    
    
    

    Results:

    id start_date end_date DateGroup
    1 2023-01-01 2024-01-01 1
    1 2023-05-01 2024-07-01 1
    1 2024-05-01 2024-07-01 1
    1 2024-08-01 2024-09-01 2
    1 2025-01-01 2026-01-01 3
    1 2026-05-01 2026-07-01 4

    You can see a live demo on db<>fiddle