Search code examples
sqlsql-servergroup-bysql-server-2016cumulative-sum

Get cumulative sum that reset for each year


Please consider this table:

Year    Month   Value   YearMonth
2011      1      70      201101
2011      1      100     201101
2011      2      200     201102
2011      2      50      201102
2011      3      80      201103
2011      3      250     201103
2012      1      100     201201
2012      2      200     201202
2012      3      250     201203

I want to get a cumulative sum based on each year. For the above table I want to get this result:

Year    Month       Sum
-----------------------
2011      1         170
2011      2         420  <--- 250 + 170
2011      3         750  <--- 330 + 250 + 170
2012      1         100
2012      2         300  <--- 200 + 100
2012      3         550  <--- 250 + 200 + 100

I wrote this code:

Select c1.YearMonth, Sum(c2.Value) CumulativeSumValue
From @Tbl c1,  @Tbl c2
Where c1.YearMonth >= c2.YearMonth
Group By c1.YearMonth
Order By c1.YearMonth Asc

But its CumulativeSumValue is calculated twice for each YearMonth:

YearMonth   CumulativeSumValue
  201101          340           <--- 170 * 2
  201102          840           <--- 420 * 2
  201103          1500
  201201          850
  201202          1050
  201203          1300

How can I achieve my desired result?

I wrote this query:

select year, (Sum (aa.[Value]) Over (partition by aa.Year Order By aa.Month)) as 'Cumulative Sum'
from @Tbl aa

But it returned multiple records for 2011:

Year    Cumulative Sum
2011        170
2011        170
2011        420
2011        420
2011        750
2011        750
2012        100
2012        300
2012        550

Solution

  • You are creating a cartesian product here. In your ANSI-89 implicit JOIN (you really need to stop using those and switch to ANSI-92 syntax) you are joining on c1.YearMonth >= c2.YearMonth.

    For your first month you have two rows with the same value of the year and month, so each of those 2 rows joins to the other 2; this results in 4 rows:

    Year Month Value1 Value2
    2011 1 70 70
    2011 1 70 100
    2011 1 100 70
    2011 1 100 100

    When you SUM this value you get 340, not 170, as you have 70+70+100+100.

    Instead of a triangular JOIN however, you should be using a windowed SUM. As you want to also get aggregate nmonths into a single rows, you'll need to also aggregate inside the windowed SUM like so:

    SELECT V.YearMonth,
           SUM(SUM(V.Value)) OVER (PARTITION BY Year ORDER BY V.YearMonth) AS CumulativeSum
    FROM (VALUES (2011, 1, 70, 201101),
                 (2011, 1, 100, 201101),
                 (2011, 2, 200, 201102),
                 (2011, 2, 50, 201102),
                 (2011, 3, 80, 201103),
                 (2011, 3, 250, 201103),
                 (2012, 1, 100, 201201),
                 (2012, 2, 200, 201202),
                 (2012, 3, 250, 201203)) V (Year, Month, Value, YearMonth)
    GROUP BY V.YearMonth,
             V.Year;