Search code examples
sqlsql-serverdateintervals

Counting days cummulatively over the last year over multiple date intervals within groups


This is a continuation of a previous question. After having resolved my initial question (Previous question), I realized that I needed to add some complexity to my question to properly solve my usecase.

I need to calculate the first 100 days of a year within groups which contains multiple date intervals. The start of the year is set arbitrarily from the first date of the first interval within each group. I also need to be able to output two variables [Location] and [Amount] associated with each interval. My problem is then that some of these intervals will overlap with a single day (identical EndDate and following StartDate), which messes up my count.

I have the following sample data Fiddle for sample data and code attempt

Group Location Amount StartDate EndDate
1 Location1 100 2020-01-01 2021-02-01
1 Location2 150 2021-02-01 2022-07-01
2 Location3 200 2020-02-01 2020-03-01
2 Location1 100 2020-09-01 2020-12-01

It would like the following output:

Group Location Amount YearMonth CountDays
1 Location1 100.00 202001 31
1 Location1 100.00 202002 29
1 Location1 100.00 202003 31
1 Location1 100.00 202004 9
1 Location1 100.00 202005 0
1 Location1 100.00 202006 0
1 Location1 100.00 202007 0
1 Location1 100.00 202008 0
1 Location1 100.00 202009 0
1 Location1 100.00 202010 0
1 Location1 100.00 202011 0
1 Location1 100.00 202012 0
1 Location1 100.00 202101 31
1 Location1 100.00 202102 1
1 Location2 150.00 202102 28
1 Location2 150.00 202103 31
1 Location2 150.00 202104 9
1 Location2 150.00 202105 0
1 Location2 150.00 202106 0
1 Location2 150.00 202107 0
1 Location2 150.00 202108 0
1 Location2 150.00 202109 0
1 Location2 150.00 202110 0
1 Location2 150.00 202111 0
1 Location2 150.00 202112 0
1 Location2 150.00 202201 31
1 Location2 150.00 202202 28
1 Location2 150.00 202203 31
1 Location2 150.00 202204 10
1 Location2 150.00 202205 0
1 Location2 150.00 202206 0
1 Location2 150.00 202207 0
2 Location3 200.00 202002 29
2 Location3 200.00 202003 1
2 Location1 100.00 202009 30
2 Location1 100.00 202010 31
2 Location1 100.00 202011 9
2 Location1 100.00 202012 0

I tried using the following (rewritten from previous question):

DECLARE @MinDate DATE = (SELECT MIN(StartDate) FROM SampleData)
DECLARE @MaxDate DATE = (SELECT MAX(EndDate) FROM SampleData)

;WITH n AS 
(
    SELECT 0 AS n
    UNION ALL
    SELECT n + 1 
    FROM n 
    WHERE n + 1 <= DATEDIFF(DAY, @MinDate, @MaxDate)
)
SELECT DATEADD(DAY, n, @MinDate) AS Date
INTO #Calendar
FROM n 
OPTION (MAXRECURSION 0)

  
CREATE TABLE #Result
(
    Date DATE,
    [Group] VARCHAR(20),
    [Location] VARCHAR(100),
    [Amount] DECIMAL(10,2),
    Counted INT,
    CumulativeCount INT,
    ResetDate DATE
);

DECLARE @Group VARCHAR(20), @Date DATE, @Location VARCHAR(100), @Amount DECIMAL(10,2);

DECLARE GroupCursor CURSOR FOR 
SELECT DISTINCT [Group], [Location], [Amount] FROM SampleData
OPEN GroupCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
    DECLARE DateCursor CURSOR FOR 
    SELECT Date FROM #Calendar ORDER BY Date
    OPEN DateCursor
    FETCH NEXT FROM DateCursor INTO @Date

    DECLARE @CumulativeCount INT = 0, @ResetDate DATE = NULL
    WHILE @@FETCH_STATUS = 0
    BEGIN

        DECLARE @Counted INT = 
        (
            SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
            FROM SampleData
            WHERE [Group] = @Group AND @Date BETWEEN StartDate AND EndDate
        )

            IF @Counted = 1 AND (@CumulativeCount = 0 OR DATEDIFF(DAY, @ResetDate, @Date) >= DATEDIFF(DAY, @ResetDate, DATEADD(YEAR, 1, @ResetDate))) --- Altered check to account for leap years. 
                BEGIN
                    SET @CumulativeCount = 1
                    SET @ResetDate = @Date  
                END
            ELSE IF @Counted = 1 AND @CumulativeCount < 100 
                SET @CumulativeCount = @CumulativeCount + 1
            ELSE IF @Counted = 1 AND @CumulativeCount >= 100
                SET @Counted = 0
        

        IF  
        ( 
            SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
              FROM SampleData
              WHERE [Group] = @Group AND [Location] = @Location AND [Amount] = @Amount AND @Date BETWEEN StartDate AND EndDate
        ) = 1
            BEGIN
                INSERT INTO #Result (Date, [Group], [Location], [Amount], Counted, CumulativeCount, ResetDate)
                VALUES (@Date, @Group, @Location, @Amount, @Counted, @CumulativeCount, @ResetDate)
            END

        FETCH NEXT FROM DateCursor INTO @Date
    END
    CLOSE DateCursor
    DEALLOCATE DateCursor

    FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount

END
CLOSE GroupCursor
DEALLOCATE GroupCursor

SELECT * FROM #Result
ORDER BY [Group], Date

SELECT 
    [Group],
    [Location],
    [Amount],
    FORMAT(Date, 'yyyyMM') AS YearMonth,
    SUM(Counted) AS CountDays
FROM #Result
GROUP BY [Group], [Location], [Amount], FORMAT(Date, 'yyyyMM')
ORDER BY [Group], YearMonth, [Location]

This does nearly exactly what I need, but I have a problem with the duplicate dates (example of 2021-02-01). Here, the variable @CumulativeCount does not update properly across the duplicate dates - which leads to counting more than 100 days in the end. I have tried multiple workarounds and fixes but have so far come up with no solution that fixes it (and doesnt break anything).

Here's a snippet of the #Results to show the problem:

Date Group Location Amount Counted CumulativeCount ResetDate
2021-01-31 1 Location1 100.00 1 31 2021-01-01
2021-02-01 1 Location1 100.00 1 32 2021-01-01
2021-02-01 1 Location2 150.00 1 32 2021-01-01
2021-02-02 1 Location2 150.00 1 33 2021-01-01

Any help would be much appreciated.


Solution

  • There's no need for cursors if you take advantage of analytic functions to compute the cumulative days numbers.

    My query uses a SQL Server 2022 function to explode the dates into a series of months via cross apply and generate_series(). You can replace that with whatever method you prefer. It also uses date_bucket() to compute a relative reference date for each logical year. Substitute an equivalent calculation if necessary. The rest is just a straightforward summation followed by some case logic to cap at 100 days.

    with explode as (
        select *,
            date_bucket(year, 1, Mnth,
                min(StartDate) over (partition by [Group])) as ReferenceDate,
            case when Mnth <> eomonth(EndDate) then day(Mnth) else 1 end as Counter
        from SampleData data cross apply (
            select eomonth(dateadd(month, value, StartDate)) as Mnth
            from generate_series(0, 99) -- adjust if 100 months isn't enough...
            where dateadd(month, value, StartDate) <= EndDate
        ) months
    ), accumulate as (
        select *,
            sum(Counter) over (partition by [Group], ReferenceDate order by Mnth) as CumDays
        from explode
    )
    select [Group], Location, Amount,
        convert(char(6), Mnth, 112) as YearMonth,
        case when CumDays <= 100 then Counter
             when CumDays - Counter < 100 then 100 - (CumDays - Counter)
             else 0 end as CountDays
    from accumulate
    order by [Group], YearMonth, Location;
    

    https://dbfiddle.uk/ZSYJpC5A