Search code examples
sqlpostgresqlaggregate-functionsplpgsqlwindow-functions

Discount accumulated negative sum from column


I have the following dataset in PostgreSQL 15:

Generate data

SELECT id, value, expected_output, comment
FROM (VALUES 
      (1, 0.89, 0.89, ''), 
      (2, 0.17, 0.17, ''), 
      (3, -0.30, 0, 'accum = -0.30'), 
      (4, -0.36, 0, 'accum = -0.30 - 0.36 = -0.66'), 
      (5, 1.41, 0.75, '0.75 = 1.41 - 0.66; accum = 0'), 
      (6, -0.49, 0, 'accum = -0.49'),
      (7, 0.24, 0, '0, because 0.24 - 0.49 = -0.25 is still negative, so accum = -0.25'), 
      (8, 6.15, 5.9, '5.9 = 6.15 - 0.25; accum = 0'), 
      (9, 6.05, 6.05, '')
) 
AS t(id, value, expected_output, comment);
id value expected_output comment
1 0.89 0.89
2 0.17 0.17
3 -0.30 0 accum = -0.30
4 -0.36 0 accum = -0.30 - 0.36 = -0.66
5 1.41 0.75 0.75 = 1.41 - 0.66; accum = 0
6 -0.49 0 accum = -0.49
7 0.24 0 0, because 0.24 - 0.49 = -0.25 is still negative, so accum = -0.25
8 6.15 5.9 5.9 = 6.15 - 0.25; accum = 0
9 6.05 6.05

And I want to generate the expected output using a function. The idea is that negative values should be accumulated, and then discounted whenever a positive value occurs.

I tried creating an aggregate that conditionally sums negative values only, and then I used that as a window function. The state function of the aggregate I came up with is this:

CREATE OR REPLACE FUNCTION public.sum_if_less_accum(
    _accumulated anyelement,
    _current anyelement)
    RETURNS anyelement
    LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
    IF _current < 0 OR _accumulated < 0THEN
        RETURN _accumulated + _current;
    ELSIF _current > _accumulated THEN
        return 0;
    END IF;
    
    RETURN _accumulated;
END
$BODY$;

I tried using that with some CASE expressions but couldn't get to the expected result. Is there a way to directly create an aggregate that does what I want?


Solution

  • Yes, you can create a custom aggregate function:

    -- state transition function 
    CREATE OR REPLACE FUNCTION f_sum_dept(numeric, numeric)
      RETURNS numeric
      LANGUAGE sql IMMUTABLE PARALLEL SAFE
    RETURN LEAST($1, 0) + $2;
    
    -- final function
    CREATE OR REPLACE FUNCTION f_sum_dept_final(numeric)
      RETURNS numeric
      LANGUAGE sql IMMUTABLE PARALLEL SAFE
    RETURN GREATEST($1, 0);
    
    -- aggregate
    CREATE OR REPLACE AGGREGATE sum_dept(numeric) (
      SFUNC     = f_sum_dept
    , STYPE     = numeric
    , INITCOND  = 0
    , FINALFUNC = f_sum_dept_final
    );
    

    This expects numeric input. (A polymorphic version is possible, too.)

    Basically, this builds a running sum, but discounts positive subtotals from the previous iteration (like that had been paid out).
    Finally, instead of negative subtotals, 0 is returned.

    SQL standard functions like I use here require Postgres 14 or later. Works for you (pg 15). For older versions, use quoted function bodies. Simple to adjust. See:

    Your query is dead simple now. Use the aggregate in a window function call:

    SELECT id, value
         , sum_dept(value) OVER (ORDER BY id) AS expected_output
    FROM   tbl;
    

    fiddle

    More generic aggregate

    The final function in the aggregate is optional and makes it more specialized. You could leave that away to make the aggregate more generic, and implement the final step in the query instead:

    CREATE OR REPLACE AGGREGATE sum_dept(numeric) (
      SFUNC     = f_sum_dept  -- same as above
    , STYPE     = numeric
    , INITCOND  = 0
    );
    
    SELECT id, value
         , GREATEST(sum_dept(value) OVER (ORDER BY id), 0) AS expected_output
    FROM   tbl;
    

    fiddle