Search code examples
sqlpostgresqlaggregate-functionswindow-functionscumulative-sum

Cumulative sum based on same column calculated result


I have the following table, for which I am trying to calculate a running balance, and remaining value, but the remaining value is the function of the previously calculated row, as such:

date         PR    amount  total    balance  remaining_value
----------------------------------------------------------
'2020-1-1'   1     1.0     100.0    1.0      100    -- 100 (inital total)
'2020-1-2'   1     2.0     220.0    3.0      320   -- 100 (previous row) + 220 
'2020-1-3'   1    -1.5    -172.5    1.5      160   -- 320 - 160 (see explanation 1)
'2020-1-4'   1     3.0     270.0    4.5      430   -- 160 + 270
'2020-1-5'   1     1.0      85.0    5.5      515   -- 430 + 85 
'2020-1-6'   1     2.0     202.0    7.5      717   -- 575 + 202 
'2020-1-7'   1    -4.0    -463.0    3.5      334.6 -- 717 - 382.4 (see explanation 2)
'2020-1-8'   1    -0.5     -55.0    3.0      ...
'2020-1-9'   1     2.0     214.0    5.0
'2020-1-1'   2     1.0     100      1.0      100   -- different PR: start new running total

The logic is as follows:

  • For positive amount rows, the remaining value is simply the value from the previous row in column remaining_value + the value in column total from that row.

  • For negative amount rows, it gets tickier:

Explanation 1: We start with 320 (previous row balance) and from it we remove 1.5/3.0 (absolute value of current row amount divided by previous row balance) and we multiply it by the previous row remaining_value, which is 320. The calculation gives:

320 - (1.5/3 * 320) = 160

Explanation 2: Same logic as above. 717 - (4/7.5 * 717) = 717 - 382.4

4/7.5 here represents the current row's absolute amount divided by the previous row's balance.

I tried the window function sum() but did not manage to get the desired result. Is there a way to get this done in PostgreSQL without having to resort to a loop?

Extra complexity: There are multiple products identified by PR (product id), 1, 2 etc. Each need their own running total and calculation.


Solution

  • You could create a custom aggregate function:

    CREATE OR REPLACE FUNCTION f_special_running_sum (_state numeric, _total numeric, _amount numeric, _prev_balance numeric)
      RETURNS numeric
      LANGUAGE sql IMMUTABLE AS
    'SELECT CASE WHEN _amount > 0 THEN _state + _total
                 ELSE _state * (1 + _amount / _prev_balance) END';
    
    CREATE OR REPLACE AGGREGATE special_running_sum (_total numeric, _amount numeric, _prev_balance numeric) (
      sfunc    = f_special_running_sum 
    , stype    = numeric
    , initcond = '0'
    );
    

    The CASE expression does the split: If amount is positive, just add total, else apply your (simplified) formula:
    320 * (1 + -1.5 / 3.0) instead of 320 - (1.5/3 * 320), i.e.:

    _state * (1 + _amount / _prev_balance) 
    

    Function and aggregate parameter names are only for documentation.

    Then your query can look like this:

    SELECT *
         , special_running_sum(total, amount, prev_balance) OVER (PARTITION BY pr ORDER BY date)
    FROM  (
       SELECT pr, date, amount, total
            , lag(balance, 1, '1') OVER (PARTITION BY pr ORDER BY date) AS prev_balance
       FROM   tbl
       ) t;
    

    db<>fiddle here

    We need a subquery to apply the first window function lag() and fetch the previous balance into the current row (prev_balance). I default to 1 if there is no previous row to avoid NULL values.

    Caveats:

    • If the first row has a negative total, the result is undefined. My aggregate function defaults to 0.

    • You did not declare data types, nor requirements regarding precision. I assume numeric and aim for maximum precision. The calculation with numeric is precise. But your formula produces fractional decimal numbers. Without rounding, there will be a lot of fractional digits after a couple of divisions, and the calculation will quickly degrade in performance. You'll have to strike a compromise between precision and performance. For example, doing the same with double precision has constant performance.

    Related: