Search code examples
sqlsql-serverdata-management

How to combine date ranges in SQL with small gaps


I have a dataset where each row has a date range. I want to combine records into single date ranges if they overlap or there's a gap of less than 30 days and they share the same ID number. If it's more than 30 days, I want them to remain separate. I can figure out how to do it if they are overlapping, and I can figure out how to do it no matter the size of the gap, but I can't figure out how to do it with a limited gap allowance.

So, for example, if my data looks like this:

ID        Date1           Date2
ABC       2018-01-01      2018-02-14
ABC       2018-02-13      2018-03-17
ABC       2018-04-01      2018-07-24
DEF       2017-01-01      2017-06-30
DEF       2017-10-01      2017-12-01

I want it to come out like this:

ID        Date1           Date2
ABC       2018-01-01      2018-07-24
DEF       2017-01-01      2017-06-30
DEF       2017-10-01      2017-12-01

The three date ranges for ABC are combined, because they either overlap or the gaps are less than 30 days. The two date ranges for DEF stay separate, because the gap between them is larger than 30 days.

I'm using Microsoft SSMS.


Solution

  • You can identify where the new periods begin. For a general problem, I would go with not exists. Then you can assign a group using cumulative sums:

    select id, sum(is_start) over (partition by id order by datestart) as grp
    from (select t.*,
                 (case when not exists (select 1
                                        from t t2
                                        where t2.id = t.id and
                                              t2.date1 >= dateadd(day, -30, t1.date1) and
                                              t2.date2 < dateadd(day, 30, t1.date2)
                                       )
                       then 1 else 0
                  end) as is_start
          from t
         ) t;
    

    The final step is aggregation:

    with g as (
          select id, sum(is_start) over (partition by id order by datestart) as grp
          from (select t.*,
                       (case when not exists (select 1
                                              from t t2
                                              where t2.id = t.id and
                                                    t2.date1 >= dateadd(day, -30, t1.date1) and
                                                    t2.date2 < dateadd(day, 30, t1.date2)
                                             )
                             then 1 else 0
                        end) as is_start
                from t
               ) t
          )
    select id, min(date1), max(date2)
    from g
    group by id, grp;