Search code examples
sqlsql-serverrolling-computation

Rolling sum where category values are replaced/updated


I'm trying to get a rolling sum working where categories/groups Amount change at any given date - when the change happens that category's new value becomes apart of the rolling sum, but the previous value of that category is then ignored; so it's a rolling sum, but only based on the latest of each category (at that point in time).

Example data (SumAmount being the problem trying to solve)

txn_id | cust_id | trans_date |  Category | amount | SumAmount
-----------------------------------------------------------------
     1 |       1 |  2020-01-01|  Ball     |      5 |   5     --first tran so sum is 5
     2 |       1 |  2020-01-02|  Cup      |      5 |   10    --sum is 10 (ball=5,Cup=5)
     3 |       1 |  2020-01-03|  Ball     |      2 |   7     --sum is 7 (ball=2,Cup=5) 
     4 |       1 |  2020-02-04|  Ball     |      4 |   9     --sum is 9 (ball=4,Cup=5)
     5 |       1 |  2020-02-05|  Ball     |      1 |   6     --sum is 6 (ball=1,Cup=5)
     6 |       1 |  2020-02-06|  Cup      |      10|   11    --sum is 11(ball=1,Cup=10)
     7 |       1 |  2020-02-07|  Phone    |      5 |   16    --sum is 16(ball=1,Cup=10,Phone=5)
     8 |       1 |  2020-02-08|  Cup      |      5 |   11    --sum is 11(ball=1,Cup=5,Phone=5)
     9 |       1 |  2020-02-09|  Ball     |      5 |   15    --sum is 15(ball=5,Cup=5,Phone=5)

I've got this working within a cursor but wondering if SET based is possible

The cursor goes like:

CREATE PROCEDURE [dbo].[PriceHistory](@CustId int, @MaxPriceHistory decimal(16,2) Output)
create table #PriceHistory ( CategoryID uniqueidentifier, Amount decimal(16,2))

    declare pricehistory_cursor CURSOR FOR
    select CategoryID, Amount
    from mytable
    where CustId =@CustId
    order by trans_date;

    declare @CategoryID uniqueidentifier
    declare @Amount decimal(16,2)
    declare @CurrentTotal decimal(16,2)

    set @MaxPriceHistory = 0

    open pricehistory_cursor
    fetch next from pricehistory_cursor into @CategoryID, @Amount

    WHILE @@FETCH_STATUS = 0  
    BEGIN 
           if (exists(select * from #PriceHistory where CategoryID = @CategoryID))
                  update #PriceHistory set Amount = @actualAmount where CategoryID = @CategoryID 
           else
                  insert into #PriceHistory(CategoryID,Amount) values (@CategoryID, @Amount)

           select @CurrentTotal = sum(Amount) from #PriceHistory

           if (@CurrentTotal > @MaxPriceHistory)
                  set @MaxPriceHistory = @CurrentTotal

           fetch next from pricehistory_cursor into @CategoryID, @Amount                 
    END

    close pricehistory_cursor
    deallocate pricehistory_cursor;

Ultimately, I'm looking for the Max SumAmount throughout the life of the transactions (column SumAmount in provided example), in which for this example it would be 16.

I get what the cursor is doing and I get why it works that way (replacing Amount for that certain Category if already exists (this is the bit I'm stumped on for SET based approach, how would I get the Cup amount of 5, when txn_id = 5 happens?), and summing it up along with all other latest category amounts made at that point in time), I just can't get my head around if it's possible to do with either some kind of recursive CTE or ROW_NUMBER.


Solution

  • Since the data is in a fresh new temporary table, that also means that the primary key doesn't have gaps.

    That's a nice situation for a recursive CTE.

    The query below keeps the latest amount for the balls, cups and phones.
    Then the calculation of the sum simply depends on the category.

    WITH RCTE_BALL_CUP_PHONE AS
    (
       SELECT txn_id, cust_id, trans_date, category, amount
       , CASE WHEN category = 'Ball'  THEN amount ELSE 0 END AS NearestBallAmount
       , CASE WHEN category = 'Cup'   THEN amount ELSE 0 END AS NearestCupAmount
       , CASE WHEN category = 'Phone' THEN amount ELSE 0 END AS NearestPhoneAmount
       , amount AS SumAmount
       FROM  #PriceHistory AS tmp
       WHERE txn_id = 1
    
       UNION ALL
    
       SELECT tmp.txn_id, tmp.cust_id, tmp.trans_date, tmp.category, tmp.amount
       , CASE WHEN tmp.category = 'Ball'  THEN tmp.amount ELSE c.NearestBallAmount END
       , CASE WHEN tmp.category = 'Cup'   THEN tmp.amount ELSE c.NearestCupAmount END
       , CASE WHEN tmp.category = 'Phone' THEN tmp.amount ELSE c.NearestPhoneAmount END
       , CASE 
         WHEN tmp.category = 'Ball'  THEN (tmp.amount + c.NearestCupAmount  + c.NearestPhoneAmount)
         WHEN tmp.category = 'Cup'   THEN (tmp.amount + c.NearestBallAmount + c.NearestPhoneAmount)
         WHEN tmp.category = 'Phone' THEN (tmp.amount + c.NearestCupAmount  + c.NearestBallAmount)
         ELSE tmp.Amount
         END
       FROM RCTE_BALL_CUP_PHONE c
       JOIN #PriceHistory AS tmp 
         ON tmp.txn_id = c.txn_id + 1
    )
    SELECT txn_id, cust_id, trans_date, category, amount
    , SumAmount
    FROM RCTE_BALL_CUP_PHONE
    ORDER BY txn_id;
    
    txn_id | cust_id | trans_date | category | amount | SumAmount
    -----: | ------: | :--------- | :------- | -----: | --------:
         1 |       1 | 2020-01-01 | Ball     |      5 |         5
         2 |       1 | 2020-01-02 | Cup      |      5 |        10
         3 |       1 | 2020-01-03 | Ball     |      2 |         7
         4 |       1 | 2020-02-04 | Ball     |      4 |         9
         5 |       1 | 2020-02-05 | Ball     |      1 |         6
         6 |       1 | 2020-02-06 | Cup      |     10 |        11
         7 |       1 | 2020-02-07 | Phone    |      5 |        16
         8 |       1 | 2020-02-08 | Cup      |      5 |        11
         9 |       1 | 2020-02-09 | Ball     |      5 |        15
    

    db<>fiddle here

    For future reference sake, here's an adaption of lptr's awesome JSON method.
    It'll work for more than 3 categories without having to change anything.

    with RCTE as
    (
      select *, cast(concat('{"', category, '":', amount, '}') as varchar(max)) as j
      from #PriceHistory t
      where txn_id=1
    
      union all
    
      select t.*, cast(json_modify(cte.j, concat('$.', t.category), t.amount) as varchar(max))
      from RCTE cte
      join #PriceHistory t on t.txn_id = cte.txn_id+1
    )
    select txn_id, cust_id, trans_date, category, amount
    , (select sum(cast(value as int)) from openjson(j)) as SumAmount
    , j
    from RCTE
    order by txn_id
    
    txn_id | cust_id | trans_date | category | amount | SumAmount | j                            
    -----: | ------: | :--------- | :------- | -----: | --------: | :----------------------------
         1 |       1 | 2020-01-01 | Ball     |      5 |         5 | {"Ball":5}                   
         2 |       1 | 2020-01-02 | Cup      |      5 |        10 | {"Ball":5,"Cup":5}           
         3 |       1 | 2020-01-03 | Ball     |      2 |         7 | {"Ball":2,"Cup":5}           
         4 |       1 | 2020-02-04 | Ball     |      4 |         9 | {"Ball":4,"Cup":5}           
         5 |       1 | 2020-02-05 | Ball     |      1 |         6 | {"Ball":1,"Cup":5}           
         6 |       1 | 2020-02-06 | Cup      |     10 |        11 | {"Ball":1,"Cup":10}          
         7 |       1 | 2020-02-07 | Phone    |      5 |        16 | {"Ball":1,"Cup":10,"Phone":5}
         8 |       1 | 2020-02-08 | Cup      |      5 |        11 | {"Ball":1,"Cup":5,"Phone":5} 
         9 |       1 | 2020-02-09 | Ball     |      5 |        15 | {"Ball":5,"Cup":5,"Phone":5}