Search code examples
sqlpostgresqlwindow-functions

Sum until threshold value reached and then reset the counter


user_id | date                 | distance
1       | 2019-04-09 00:00:00  | 2
1       | 2019-04-09 00:00:30  | 5
1       | 2019-04-09 00:01:00  | 3
1       | 2019-04-09 00:01:45  | 7
1       | 2019-04-09 00:02:30  | 6
1       | 2019-04-09 00:03:00  | 1

How do I sum distance over next row until threshold point is reached and reset the counter again.

For instance if the threshold value is 10 I am trying to get the following output:

1       | 2019-04-09 00:00:00  | 2
1       | 2019-04-09 00:00:30  | 7            (2 + 5)
1       | 2019-04-09 00:01:00  | 10           ( 7 + 3 )
1       | 2019-04-09 00:01:45  | 7            RESET
1       | 2019-04-09 00:02:30  | 13           (7 + 6 )
1       | 2019-04-09 00:03:00  | 1            RESET

But all I could achieve is get cumulative distance with following query:

SELECT *, sum(distance) over (order by date asc) as running_distance FROM table;

I am using PostgreSQL.


Solution

  • Use user-defined aggregate

    Live test: http://sqlfiddle.com/#!17/16716/2

    SELECT *, sum_with_reset(distance, 10) over (order by date asc) as running_distance 
    FROM tbl;
    

    User-defined aggregate sum_with_reset definition:

    create or replace function sum_reset_accum(
        _accumulated numeric, _current numeric, _threshold numeric
    )
    returns numeric as
    $$
        select case when _accumulated >= _threshold then
            _current
        else
            _current + _accumulated
        end    
    $$ language sql;
    
    
    create aggregate sum_with_reset(numeric, numeric)
    (
        sfunc = sum_reset_accum,
        stype = numeric,
        initcond = 0
    );
    

    Data

    CREATE TABLE tbl
        ("user_id" int, "date" timestamp, "distance" int)
    ;
    
    INSERT INTO tbl
        ("user_id", "date", "distance")
    VALUES
        (1, '2019-04-09 00:00:00', 2),
        (1, '2019-04-09 00:00:30', 5),
        (1, '2019-04-09 00:01:00', 3),
        (1, '2019-04-09 00:01:45', 7),
        (1, '2019-04-09 00:02:30', 6),
        (1, '2019-04-09 00:03:00', 1)
    ;
    

    Output:

    | user_id |                 date | distance | running_distance |
    |---------|----------------------|----------|------------------|
    |       1 | 2019-04-09T00:00:00Z |        2 |                2 |
    |       1 | 2019-04-09T00:00:30Z |        5 |                7 |
    |       1 | 2019-04-09T00:01:00Z |        3 |               10 |
    |       1 | 2019-04-09T00:01:45Z |        7 |                7 |
    |       1 | 2019-04-09T00:02:30Z |        6 |               13 |
    |       1 | 2019-04-09T00:03:00Z |        1 |                1 |
    

    One-liner:

    create or replace function sum_reset_accum(
        _accumulated numeric, _current numeric, _threshold numeric
    )
    returns numeric as
    $$
        select _current + _accumulated * (_accumulated < _threshold)::int
    $$ language 'sql';
    

    Postgres boolean can cast true to 1, false to 0 by using cast operator ::int.

    You can use plpgsql language too:

    create or replace function sum_reset_accum(
        _accumulated numeric, _current numeric, _threshold numeric
    )
    returns numeric as
    $$begin
        return _current + _accumulated * (_accumulated < _threshold)::int;
    end$$ language 'plpgsql';
    

    Note that you cannot create plpgsql function on sqlfiddle.com, so you cannot test that plpgsql code on sqlfiddle.com. You can, on your machine though.