Search code examples
sqlt-sqldatetimesumwindow-functions

Rolling sum based on date (when dates are missing)


You may be aware of rolling the results of an aggregate over a specific number of preceding rows. I.e.: how many hot dogs did I eat over the last 7 days

SELECT HotDogCount, 
       DateKey, 
       SUM(HotDogCount) OVER (ORDER BY DateKey ROWS 6 PRECEDING) AS HotDogsLast7Days
FROM dbo.HotDogConsumption

Results:

+-------------+------------+------------------+
| HotDogCount |  DateKey   | HotDogsLast7Days |
+-------------+------------+------------------+
|           3 | 09/21/2020 |                3 |
|           2 | 9/22/2020  |                5 |
|           1 | 09/23/2020 |                6 |
|           1 | 09/24/2020 |                7 |
|           1 | 09/25/2020 |                8 |
|           4 | 09/26/2020 |               12 |
|           1 | 09/27/2020 |               13 |
|           3 | 09/28/2020 |               13 |
|           2 | 09/29/2020 |               13 |
|           1 | 09/30/2020 |               13 |
+-------------+------------+------------------+

Now, the problem I am having is when there are gaps in the dates. So, basically, one day my intestines and circulatory system are screaming at me: "What the heck are you doing, you're going to kill us all!!!" So, I decide to give my body a break for a day and now there is no record for that day. When I use the "ROWS 6 PRECEDING" method, I will now be reaching back 8 days, rather than 7, because one day was missed.

So, the question is, do any of you know how I could use the OVER clause to truly use a date value (something like "DATEADD(day,-7,DateKey)") to determine how many previous rows should be summed up for a true 7 day rolling sum, regardless of whether I only ate hot dogs on one day or on all 7 days?

Side note, to have a record of 0 for the days I didn't eat any hotdogs is not an option. I understand that I could use an array of dates and left join to it and do a

CASE WHEN Datekey IS NULL THEN 0 END

type of deal, but I would like to find out if there is a different way where the rows preceding value can somehow be determined dynamically based on the date.


Solution

  • Window functions are the right approach in theory. But to look back at the 7 preceding days (not rows), we need a range frame specification - which, unfornately, SQL Server does not support.

    I am going to recommend a subquery, or a lateral join:

    select hdc.*, hdc1.*
    from dbo.HotDogConsumption hdc
    cross apply (
        select coalesce(sum(HotDogCount), 0) HotDogsLast7Days 
        from dbo.HotDogConsumption hdc1
        where hdc1.datekey >= dateadd(day, -7, hdc.datekey)
          and hdc1.datekey < hdc.datekey
    ) hdc1
    

    You might want to adjust the conditions in the where clause of the subquery to the precise frame that you want. The above code computes over the last 7 days, not including today. Something equivalent to your current attempt would be like:

        where hdc1.datekey >= dateadd(day, -6, hdc.datekey)
          and hdc1.datekey <= hdc.datekey