Search code examples
sqlsql-serverdatetimegaps-and-islands

Combine periods by user


I want to combine periods(start -> end dates) by the intervals they overlap into a single interval by user.

+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE |  END_DATE  |
+---------+-------------+------------+------------+
|       1 |          14 | 18.02.2021 | 18.04.2022 |
|       1 |          13 | 01.01.2019 | 01.01.2020 |
|       1 |          12 | 01.01.2018 | 01.01.2019 |
|       1 |          11 | 13.02.2017 | 13.02.2019 |
|       2 |          23 | 19.06.2021 | 18.04.2022 |
|       2 |          22 | 01.07.2019 | 01.07.2020 |
|       2 |          21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+

And as a result I want table like this:

+---------+------------+------------+
| USER_ID | START_DATE |  END_DATE  |
+---------+------------+------------+
|       1 | 18.02.2021 | 18.04.2022 |
|       1 | 13.02.2017 | 01.01.2020 |
|       2 | 19.06.2021 | 18.04.2022 |
|       2 | 19.01.2019 | 01.07.2020 |
+---------+------------+------------+

I tried different options but nothing seems to work.


Solution

  • WITH DATAD (USER_ID,CONTRACT_ID,START_DATE,END_DATE)AS
    (
      SELECT 1,14,'2021-02-18','2022-04-18' UNION ALL
      SELECT 1,13,'2019-01-01','2020-01-01' UNION ALL
      SELECT 1,12,'2018-01-01','2019-01-01' UNION ALL
      SELECT 1,11,'2017-02-13','2019-02-13' UNION ALL
      SELECT 2,23,'2021-06-19','2022-04-18' UNION ALL
      SELECT 2,22,'2019-07-01','2020-07-01' UNION ALL
      SELECT 2,21,'2019-01-19','2020-01-19' 
    ),
    CTE AS
    (
      SELECT USER_ID,START_DATE AS TS ,1 AS TYPE,
      ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY START_DATE) AS S,NULL AS E 
      FROM DATAD
    
       UNION ALL
    
     SELECT USER_ID,END_DATE AS TS,-1 AS TYPE,
     NULL,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY END_DATE)AS E
     FROM DATAD
    ),
    C2 AS
    (
      SELECT *,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS,TYPE DESC)AS SE 
      FROM CTE 
    ),
    C3 AS 
    (
     SELECT *,
      FLOOR((ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS)+1)/2)AS P
     FROM C2
     CROSS APPLY (VALUES(S-(SE-S)-1,(SE-E)-E)) AS A(CS,CE)
     WHERE CS=0 OR CE=0
    )
    SELECT USER_ID,MIN(TS)AS START_TIME,MAX(TS)AS END_TIME
    FROM C3
    GROUP BY USER_ID,P
    ORDER BY USER_ID;
    

    This is a "copy-paste" from the book "T-SQL Querying" by Itzik Ben-Gan. Paragraph about "packing intervals"