Search code examples
sqlsql-serversql-server-2014

Fill gaps in SQL Server dates ranges


In SQL Server 2014 I have a Periods table that looks like the following:

| PeriodId | PeriodStart | PeriodEnd  |
---------------------------------------
| 202005   | 2020-05-01  | 2020-05-31 |
| 202006   | 2020-06-01  | 2020-06-30 |

A period won't always be from the first to the last day of the month.

Then I have an Activities table, which has some activities the user has programmed:

| ActivityId | UserId | ActivityStart | ActivityEnd |
-----------------------------------------------------
| 1          | A      | 2020-05-20    | 2020-06-05  |
| 2          | A      | 2020-06-15    | 2020-06-18  |
| 3          | B      | 2020-06-10    | 2020-06-25  |

There can be gaps between the activities of a user, but the same user will never have overlaping activities.

Now I need a query that limits the activities dates ranges to the start and end of the period, and fills the gaps to complete the period. I'll always filter by PeriodId, so I'll just put the example result for PeriodId = 202006:

| PeriodId | UserId | ActivityId | NewActivityStart | NewActivityEnd |
----------------------------------------------------------------------
| 202006   | A      | 1          | 2020-06-01       | 2020-06-05     |  --Part of ActivityId 1
| 202006   | A      | NULL       | 2020-06-06       | 2020-06-14     |  --Fill between activities 1 and 2
| 202006   | A      | 2          | 2020-06-15       | 2020-06-18     |
| 202006   | A      | NULL       | 2020-06-19       | 2020-06-30     |  --Fill until end of period
| 202006   | B      | NULL       | 2020-06-01       | 2020-06-09     |  --Fill from start of period
| 202006   | B      | 3          | 2020-06-10       | 2020-06-25     |
| 202006   | B      | NULL       | 2020-06-26       | 2020-06-30     |  --Fill until end of period

I've been able to contain the activity dates within the period with the following query:

SELECT p.PeriodId, a.UserId, a.ActivityId
       IIF(p.PeriodStart > a.ActivityStart, p.PeriodStart, a.ActivityStart) AS NewActivityStart,
       IIF(p.PeriodEnd < a.ActivityEnd, p.PeriodEnd, a.ActivityEnd) AS NewActivityEnd
FROM Periods p
JOIN Activities a ON a.ActivityStart <= p.PeriodEnd AND a.ActivityEnd >= p.PeriodStart

But I haven't been able to fill the gaps in the ranges. I've tried with a correlative dates table and/or with Window Functions like LAG/LEAD.

I feel like Window Functions could be the solution, and I've tried to follow examples about gaps/islands, but I just haven't been able to understand them well enough to make it work.

Is there a way to complete the query to fill the missing gaps? Are there other ways to achieve this in a query?


Solution

  • I don't think this is that complicated. If you expand the periods into individual dates and do a left join, then this becomes a gaps-and-islands problem:

    with dates as (
          select periodid, periodstart as dte, periodend
          from periods
          union all
          select periodid, dateadd(day, 1, dte), periodend
          from dates
          where dte < periodend
         )
    select userid, activityid, min(dte), max(dte)
    from (select d.dte, d.periodid, u.userid, a.activityid,
                 row_number() over (partition by u.userid, a.activityid order by d.dte) as seqnum
          from dates d cross join
               (select distinct userid from activities) u left join
               activities a
               on a.userid = u.userid and
                  a.activitystart <= d.dte and a.activityend >= d.dte
         ) da
    group by userid, activityid, periodid, dateadd(day, -seqnum, dte)
    order by userid, min(dte);
    

    Here is a db<>fiddle.

    Note: This produces results for all users and all periods -- which seems reasonable given your description. It is pretty simple to modify to filter out users with no activity during a given period.

    Also, this does not go to the end of the month. Instead, it includes the complete periods. I don't see why months would play into this -- except to confuse matters -- consider if two periods have days in the same month, for instance.