This is a continuation of a previous question. After having resolved my initial question (Previous question), I realized that I needed to add some complexity to my question to properly solve my usecase.
I need to calculate the first 100 days of a year within groups which contains multiple date intervals. The start of the year is set arbitrarily from the first date of the first interval within each group. I also need to be able to output two variables [Location] and [Amount] associated with each interval. My problem is then that some of these intervals will overlap with a single day (identical EndDate and following StartDate), which messes up my count.
I have the following sample data Fiddle for sample data and code attempt
Group | Location | Amount | StartDate | EndDate |
---|---|---|---|---|
1 | Location1 | 100 | 2020-01-01 | 2021-02-01 |
1 | Location2 | 150 | 2021-02-01 | 2022-07-01 |
2 | Location3 | 200 | 2020-02-01 | 2020-03-01 |
2 | Location1 | 100 | 2020-09-01 | 2020-12-01 |
It would like the following output:
Group | Location | Amount | YearMonth | CountDays |
---|---|---|---|---|
1 | Location1 | 100.00 | 202001 | 31 |
1 | Location1 | 100.00 | 202002 | 29 |
1 | Location1 | 100.00 | 202003 | 31 |
1 | Location1 | 100.00 | 202004 | 9 |
1 | Location1 | 100.00 | 202005 | 0 |
1 | Location1 | 100.00 | 202006 | 0 |
1 | Location1 | 100.00 | 202007 | 0 |
1 | Location1 | 100.00 | 202008 | 0 |
1 | Location1 | 100.00 | 202009 | 0 |
1 | Location1 | 100.00 | 202010 | 0 |
1 | Location1 | 100.00 | 202011 | 0 |
1 | Location1 | 100.00 | 202012 | 0 |
1 | Location1 | 100.00 | 202101 | 31 |
1 | Location1 | 100.00 | 202102 | 1 |
1 | Location2 | 150.00 | 202102 | 28 |
1 | Location2 | 150.00 | 202103 | 31 |
1 | Location2 | 150.00 | 202104 | 9 |
1 | Location2 | 150.00 | 202105 | 0 |
1 | Location2 | 150.00 | 202106 | 0 |
1 | Location2 | 150.00 | 202107 | 0 |
1 | Location2 | 150.00 | 202108 | 0 |
1 | Location2 | 150.00 | 202109 | 0 |
1 | Location2 | 150.00 | 202110 | 0 |
1 | Location2 | 150.00 | 202111 | 0 |
1 | Location2 | 150.00 | 202112 | 0 |
1 | Location2 | 150.00 | 202201 | 31 |
1 | Location2 | 150.00 | 202202 | 28 |
1 | Location2 | 150.00 | 202203 | 31 |
1 | Location2 | 150.00 | 202204 | 10 |
1 | Location2 | 150.00 | 202205 | 0 |
1 | Location2 | 150.00 | 202206 | 0 |
1 | Location2 | 150.00 | 202207 | 0 |
2 | Location3 | 200.00 | 202002 | 29 |
2 | Location3 | 200.00 | 202003 | 1 |
2 | Location1 | 100.00 | 202009 | 30 |
2 | Location1 | 100.00 | 202010 | 31 |
2 | Location1 | 100.00 | 202011 | 9 |
2 | Location1 | 100.00 | 202012 | 0 |
I tried using the following (rewritten from previous question):
DECLARE @MinDate DATE = (SELECT MIN(StartDate) FROM SampleData)
DECLARE @MaxDate DATE = (SELECT MAX(EndDate) FROM SampleData)
;WITH n AS
(
SELECT 0 AS n
UNION ALL
SELECT n + 1
FROM n
WHERE n + 1 <= DATEDIFF(DAY, @MinDate, @MaxDate)
)
SELECT DATEADD(DAY, n, @MinDate) AS Date
INTO #Calendar
FROM n
OPTION (MAXRECURSION 0)
CREATE TABLE #Result
(
Date DATE,
[Group] VARCHAR(20),
[Location] VARCHAR(100),
[Amount] DECIMAL(10,2),
Counted INT,
CumulativeCount INT,
ResetDate DATE
);
DECLARE @Group VARCHAR(20), @Date DATE, @Location VARCHAR(100), @Amount DECIMAL(10,2);
DECLARE GroupCursor CURSOR FOR
SELECT DISTINCT [Group], [Location], [Amount] FROM SampleData
OPEN GroupCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE DateCursor CURSOR FOR
SELECT Date FROM #Calendar ORDER BY Date
OPEN DateCursor
FETCH NEXT FROM DateCursor INTO @Date
DECLARE @CumulativeCount INT = 0, @ResetDate DATE = NULL
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @Counted INT =
(
SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
FROM SampleData
WHERE [Group] = @Group AND @Date BETWEEN StartDate AND EndDate
)
IF @Counted = 1 AND (@CumulativeCount = 0 OR DATEDIFF(DAY, @ResetDate, @Date) >= DATEDIFF(DAY, @ResetDate, DATEADD(YEAR, 1, @ResetDate))) --- Altered check to account for leap years.
BEGIN
SET @CumulativeCount = 1
SET @ResetDate = @Date
END
ELSE IF @Counted = 1 AND @CumulativeCount < 100
SET @CumulativeCount = @CumulativeCount + 1
ELSE IF @Counted = 1 AND @CumulativeCount >= 100
SET @Counted = 0
IF
(
SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
FROM SampleData
WHERE [Group] = @Group AND [Location] = @Location AND [Amount] = @Amount AND @Date BETWEEN StartDate AND EndDate
) = 1
BEGIN
INSERT INTO #Result (Date, [Group], [Location], [Amount], Counted, CumulativeCount, ResetDate)
VALUES (@Date, @Group, @Location, @Amount, @Counted, @CumulativeCount, @ResetDate)
END
FETCH NEXT FROM DateCursor INTO @Date
END
CLOSE DateCursor
DEALLOCATE DateCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount
END
CLOSE GroupCursor
DEALLOCATE GroupCursor
SELECT * FROM #Result
ORDER BY [Group], Date
SELECT
[Group],
[Location],
[Amount],
FORMAT(Date, 'yyyyMM') AS YearMonth,
SUM(Counted) AS CountDays
FROM #Result
GROUP BY [Group], [Location], [Amount], FORMAT(Date, 'yyyyMM')
ORDER BY [Group], YearMonth, [Location]
This does nearly exactly what I need, but I have a problem with the duplicate dates (example of 2021-02-01). Here, the variable @CumulativeCount does not update properly across the duplicate dates - which leads to counting more than 100 days in the end. I have tried multiple workarounds and fixes but have so far come up with no solution that fixes it (and doesnt break anything).
Here's a snippet of the #Results to show the problem:
Date | Group | Location | Amount | Counted | CumulativeCount | ResetDate |
---|---|---|---|---|---|---|
2021-01-31 | 1 | Location1 | 100.00 | 1 | 31 | 2021-01-01 |
2021-02-01 | 1 | Location1 | 100.00 | 1 | 32 | 2021-01-01 |
2021-02-01 | 1 | Location2 | 150.00 | 1 | 32 | 2021-01-01 |
2021-02-02 | 1 | Location2 | 150.00 | 1 | 33 | 2021-01-01 |
Any help would be much appreciated.
There's no need for cursors if you take advantage of analytic functions to compute the cumulative days numbers.
My query uses a SQL Server 2022 function to explode the dates into a series of months via cross apply
and generate_series()
. You can replace that with whatever method you prefer. It also uses date_bucket()
to compute a relative reference date for each logical year. Substitute an equivalent calculation if necessary. The rest is just a straightforward summation followed by some case
logic to cap at 100 days.
with explode as (
select *,
date_bucket(year, 1, Mnth,
min(StartDate) over (partition by [Group])) as ReferenceDate,
case when Mnth <> eomonth(EndDate) then day(Mnth) else 1 end as Counter
from SampleData data cross apply (
select eomonth(dateadd(month, value, StartDate)) as Mnth
from generate_series(0, 99) -- adjust if 100 months isn't enough...
where dateadd(month, value, StartDate) <= EndDate
) months
), accumulate as (
select *,
sum(Counter) over (partition by [Group], ReferenceDate order by Mnth) as CumDays
from explode
)
select [Group], Location, Amount,
convert(char(6), Mnth, 112) as YearMonth,
case when CumDays <= 100 then Counter
when CumDays - Counter < 100 then 100 - (CumDays - Counter)
else 0 end as CountDays
from accumulate
order by [Group], YearMonth, Location;