Search code examples
sqlmysqlwindow-functions

Cumulative Sum with Difference from next row


Here is the dataset: (Desired Output is highlighted as Bold in image) Trying to do this on Snowflake SQL, Column F in the image has an explanation of how column E is derived

Excel Table with Image, Result in Bold and Explanation

I want to get column 5, Column 5 is the sum for each region and type, minus the amount associated with each month.

For Region USA and Type Apple, line 1 on Column 5 shows 160, meaning ( Total of USA, Apple is 170) so we subtract 10 (Amount of MONTH 1) from it to get 160 line 2 of column 5 shows 140, the amount associated with this row is 20 (Amount of MONTH 2), so we subtract it to get 140 .....

Similarly for Region Europe and Type Orange, the total is 135, for line 13 of column 5, we subtract 10 from 135 to get 125 for line 14 of column 5, we subtract 5 from 125 to get 120.

select 
  *, (sum - l_amt) as amount, (sum - amt) as amount2
from (
select 
   region, type, month, amount,
   sum(amount) over (partition by region, type) as sum,
   lag(amount) over (partition by region, type, month order by month asc) as l_amt
from table)

Both Amount and Amount2 from the code is wrong, not sure how to do this?


Solution

  • You can use your approach (corrected as suggested by ValNik):

    select 
        region, type, month, amount,
        sum(amount) over (partition by region, type) -
        sum(amount) over (partition by region, type order by month asc) as amt
    from tbl
    order by region desc, type asc, month asc;
    

    Or you can use the cumulative sum ordered by month desc with a 1 row offset:

    select *,
        ifnull(sum(amount) over (
            partition by region, type
            order by month desc
            rows between unbounded preceding and 1 preceding
        ), 0) as amount
    from tbl
    order by region desc, type asc, month asc;
    

    Both will return your desired output, but the second query is a little more efficient as it uses only one temporary table, as opposed to the two required by your query.

    Here's a db<>fiddle.