Search code examples
postgresqldateoverlapdate-rangecoalesce

Coalescing date ranges in postgres to eliminate overlaps


If I have a postgres table a:

  member |    start   |    end
---------+------------+------------
    1    | 2015-01-01 | 2015-05-01
---------+------------+------------
    1    | 2015-03-01 | 2015-06-01
---------+------------+------------
    2    | 2015-01-01 | 2015-05-01
---------+------------+------------
    2    | 2015-06-01 | 2015-08-01

How would I coalesce dates to eliminate overlapping ranges like this:

  member |    start   |    end
---------+------------+------------
    1    | 2015-01-01 | 2015-06-01
---------+------------+------------
    2    | 2015-01-01 | 2015-05-01
---------+------------+------------
    2    | 2015-06-01 | 2015-08-01

Solution

  • In the chop CTE original ranges are "chopped" into smaller, non-intersecting (but possibly adjacent) ranges. They are constructed from all the end points of the original ranges, both start and finish.

    Main select works as follows (read it from the inside out):

    1. Adjacent flag for a range is zero when it adjoins the previous range (assuming that ranges are ordered by their start dates).
    2. A cumulative sum of adjacent flags gives us a grouping value: all the adjacent ranges will have the same sum.
    3. Outermost block simply calculates the bounding values for the adjacent groups of ranges.

    Black magic of window functions...

    with chop as (
      select member,
             pt as start,
             lead(pt) over (partition by member order by pt) finish,
             (
               select count(*)
               from   a
               where  b.member = a.member
               and    b.pt >= a.start
               and    b.pt < a.finish
             ) need_it
      from   (
               select member, start pt from a
               union
               select member, finish pt from a
             ) b
    )
    -- 3
    select member,
           min(start),
           max(finish)
    from   (
             -- 2
             select member,
                    start,
                    finish,
                    sum(adjacent) over (partition by member order by start) grp
             from   (
                      -- 1
                      select member,
                             start,
                             finish,
                             case
                               when start <= lag(finish) over (partition by member order by start)
                               then 0
                               else 1
                             end adjacent
                      from   chop
                      where  need_it > 0
                    ) t
           ) q
    group by member,
             grp
    order by member,
             min(start);
    

    I renamed end to finish because end is a keyword.