Search code examples
postgresqlsql-updatewindow-functions

UPDATE column with cumulative sum


I have a column a_increment and I want to use it to calculate a column a_cumulative, which is the sum of all earlier a_increment, including the current row. "Earlier" as defined by the timestamp column "DateTime".

I have tried several approaches, for example an iterative approach where I calculate the new a_cumulative as the previous a_cumulative + the current a_increment. However, this was done backwards in time, which does not yield the correct result.

I found that a window function could yield the correct result, however, window functions are not allowed in an update of the table.

My intuition is that using an iterative approach would be fastest, but I don't know how to force the iteration to go forward.
Here is what I expect:

ID  "DateTime"                  a_increment  a_cumulative
1   "2024-05-17 09:00:04+02"    1            1
2   "2024-05-17 09:00:14+02"    2            3
3   "2024-05-17 09:00:24+02"    0            3
4   "2024-05-17 09:00:34+02"    5            8

This is the code for brute-forcing (it works, but it is very inefficient, took more than 5 days to run):

UPDATE data_cumulative_sums t
SET a_cumulative = COALESCE((SELECT SUM(a_increment) FROM "Data" WHERE "DateTime" <= t."DateTime"), 0);

This is where I tried to do it iteratively:

UPDATE data_cumulative_sums t
SET a_cumulative = COALESCE((SELECT COALESCE(t2.a_cumulative, 0) FROM data_cumulative_sums t2 WHERE t2.id = t.id - 1), 0) + t.a_increment 

This returns:

ID  "DateTime"                  a_increment  a_cumulative
1   "2024-05-17 09:00:04+02"    1            1
2   "2024-05-17 09:00:14+02"    2            2
3   "2024-05-17 09:00:24+02"    0            0
4   "2024-05-17 09:00:34+02"    5            5

If I run it again, it returns:

ID  "DateTime"                  a_increment  a_cumulative
1   "2024-05-17 09:00:04+02"    1            1
2   "2024-05-17 09:00:14+02"    2            3
3   "2024-05-17 09:00:24+02"    0            2
4   "2024-05-17 09:00:34+02"    5            5

If I run it as many times as I have rows, the expected result is achieved (but that would be too slow, of course)

I have found the window function:

Select sum(a_increment) over (order by "DateTime")
from data_cumulative_sums

which returns the expected result. However, when I try:

UPDATE data_cumulative_sums t
SET a_cumulative =sum(a_increment) OVER (ORDER BY "DateTime");

I get the error:

window functions are not allowed in UPDATE


Solution

  • ... however, window functions are not allowed in an update of the table.

    That's almost, but not quite true. (That error message might be more specific.) You can still use window functions in a subquery or CTE, and then self-join. Like:

    UPDATE "Data" t
    SET    a_cumulative = COALESCE(u.a_cumu, 0)
    FROM  (
       SELECT id, SUM(a_increment) OVER (ORDER BY "DateTime") AS a_cumu
       FROM   "Data"
       )  u
    WHERE  u.id = t.id;
    

    id being the primary key of the table. You can use any unique, not-null (combination of) column(s).

    If there can be duplicate values in "DateTime" you have to define how to deal with those exactly. This query includes all peers in a_cumulative.