In SQL Server 2014 I have a Periods
table that looks like the following:
| PeriodId | PeriodStart | PeriodEnd |
---------------------------------------
| 202005 | 2020-05-01 | 2020-05-31 |
| 202006 | 2020-06-01 | 2020-06-30 |
A period won't always be from the first to the last day of the month.
Then I have an Activities
table, which has some activities the user has programmed:
| ActivityId | UserId | ActivityStart | ActivityEnd |
-----------------------------------------------------
| 1 | A | 2020-05-20 | 2020-06-05 |
| 2 | A | 2020-06-15 | 2020-06-18 |
| 3 | B | 2020-06-10 | 2020-06-25 |
There can be gaps between the activities of a user, but the same user will never have overlaping activities.
Now I need a query that limits the activities dates ranges to the start and end of the period, and fills the gaps to complete the period. I'll always filter by PeriodId
, so I'll just put the example result for PeriodId = 202006
:
| PeriodId | UserId | ActivityId | NewActivityStart | NewActivityEnd |
----------------------------------------------------------------------
| 202006 | A | 1 | 2020-06-01 | 2020-06-05 | --Part of ActivityId 1
| 202006 | A | NULL | 2020-06-06 | 2020-06-14 | --Fill between activities 1 and 2
| 202006 | A | 2 | 2020-06-15 | 2020-06-18 |
| 202006 | A | NULL | 2020-06-19 | 2020-06-30 | --Fill until end of period
| 202006 | B | NULL | 2020-06-01 | 2020-06-09 | --Fill from start of period
| 202006 | B | 3 | 2020-06-10 | 2020-06-25 |
| 202006 | B | NULL | 2020-06-26 | 2020-06-30 | --Fill until end of period
I've been able to contain the activity dates within the period with the following query:
SELECT p.PeriodId, a.UserId, a.ActivityId
IIF(p.PeriodStart > a.ActivityStart, p.PeriodStart, a.ActivityStart) AS NewActivityStart,
IIF(p.PeriodEnd < a.ActivityEnd, p.PeriodEnd, a.ActivityEnd) AS NewActivityEnd
FROM Periods p
JOIN Activities a ON a.ActivityStart <= p.PeriodEnd AND a.ActivityEnd >= p.PeriodStart
But I haven't been able to fill the gaps in the ranges. I've tried with a correlative dates table and/or with Window Functions like LAG/LEAD.
I feel like Window Functions could be the solution, and I've tried to follow examples about gaps/islands, but I just haven't been able to understand them well enough to make it work.
Is there a way to complete the query to fill the missing gaps? Are there other ways to achieve this in a query?
I don't think this is that complicated. If you expand the periods into individual dates and do a left join
, then this becomes a gaps-and-islands problem:
with dates as (
select periodid, periodstart as dte, periodend
from periods
union all
select periodid, dateadd(day, 1, dte), periodend
from dates
where dte < periodend
)
select userid, activityid, min(dte), max(dte)
from (select d.dte, d.periodid, u.userid, a.activityid,
row_number() over (partition by u.userid, a.activityid order by d.dte) as seqnum
from dates d cross join
(select distinct userid from activities) u left join
activities a
on a.userid = u.userid and
a.activitystart <= d.dte and a.activityend >= d.dte
) da
group by userid, activityid, periodid, dateadd(day, -seqnum, dte)
order by userid, min(dte);
Here is a db<>fiddle.
Note: This produces results for all users and all periods -- which seems reasonable given your description. It is pretty simple to modify to filter out users with no activity during a given period.
Also, this does not go to the end of the month. Instead, it includes the complete periods. I don't see why months would play into this -- except to confuse matters -- consider if two periods have days in the same month, for instance.