Search code examples
sqlpostgresqloverlap

Get distinct consecutive date ranges from overlapping date ranges


I need to get a list of date ranges that are NOT overlapping with each other from a list of overlapping dates and get the sum of coins during that overlap. I have tried googling for an example but no luck so far. I might not be using the right key words?

I have a list of overlapping dates

1.1.2018 - 31.1.2018 80
7.1.2018 - 10.1.2018 10
7.1.2018 - 31.1.2018 10
11.1.2018 - 31.1.2018 5
25.1.2018 - 27.1.2018 5
2.2.2018 - 23.2.2018 100

Desired outcome would be

1.1.2018 - 6.7.2018 80 coins
7.1.2018 - 10.1.2018 100 coins
11.1.2018 - 24.1.2018 95 coins
25.1.2018 - 27.1.2018 100 coins
28.1.2018 - 31.1.2018 95 coins
2.2.2018 - 23.2.2018 100 coins

Here is a figure how it should work

|------------------------------|
       |---|
       |-----------------------|
           |-------------------|
                      |---|
                                   |----------------------|
Outcome              
|------|---|----------|---|----|   |----------------------|
   80   100     95     100  95                100

This is my test data

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
; 

UPDATE: Actually StephenM's and joops answers do not meet my desired outcome. Both answers show enddate wrong.

When one period ends the next should start next day (or later if there is a gap). In my desired outcome 1.1.2018-6.1.2018 includes the 6th day. There is no gap between 6th and 7th because 7th is included in 7.1.2018-10.1.2018.

UPDATE2: Now I understood what is the difference between open, half open and closed intervals. In joops solution, calculation must be done against half open intervals, but my desired outcome is closed interval. That is why enddate must be reduced to make the outcome as closed interval. Correct me if I am wrong.

I also added userid in the sample data and modified joops solution some more. Here is the query that gives me my desired outcome.

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

Outcome: Outcome


Solution

  • The logic is:

    • at the beginning of an interval add its value to a cumulative sum
    • at the end of an interval substract its value from this sum
    • but in order to sweep the dateline, we'll have to collect al the (unique) date/time stamps, either start or stop.

    So the point is: convert the data from a series of intervals to a series of (start/stop) events, and aggregate over these.


    -- \i tmp.sql
    
    create table coinsonperiod(
      id serial,
      startdate date,
      enddate date,
      coins integer
    );
    insert into coinsonperiod (startdate, enddate, coins) values
      ('2018-01-01','2018-01-31', 80)
    , ('2018-01-07','2018-01-10', 10)
    , ('2018-01-07','2018-01-31', 10)
    , ('2018-01-11','2018-01-31', 5)
    , ('2018-01-25','2018-01-27', 5)
    , ('2018-02-02','2018-02-23', 100)
            ;
    
    WITH changes AS (
        SELECT startdate AS tickdate , coins
                , 1 AS cover
        FROM coinsonperiod
        UNION ALL
        -- add 1 day to convert to half-open intervals
        SELECT 1+enddate AS tickdate, -1* coins
                , -1 AS cover
        FROM coinsonperiod
        )
    , sumchanges  AS (
            SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
            FROM changes
            GROUP BY tickdate
            )
    , aggregated AS (
            SELECT
            tickdate AS startdate
            , lead(tickdate) over www AS enddate
            , sum(change) OVER www AS cash
              -- number of covered intervals
            , sum(cover) OVER www AS cover
            FROM sumchanges
            WINDOW www AS (ORDER BY tickdate)
            )
                 -- substract one day from enddate to correct back to closed intervals
    SELECT startdate, enddate-1 AS enddate, cash, cover
    FROM aggregated
    WHERE cover > 0
    ORDER BY startdate
            ;