Search code examples
rfor-loopcumsum

Cumulative sum with potential "resets"


I have a numeric vector for which I want to calculate a sort of cumulative sum. I say "a sort" because a standard cumsum is basically cumsum[i] = cumsum[i-1] + x[i]. In this case, however, I need to use cumsum[i] = max(cumsum[i-1] + x[i], x[i]).

That is, if the value of the most recent element is greater than the cumulative sum (which can happen if there were previous negative values, for example), then simply adopt the most recent element.

This can obviously be done manually with a trivial for-loop:

set.seed(1)
x <- runif(10, min = -1, max = 1)
csum <- rep(0, 10)
for(i in seq_along(x)) {
  if (i == 1) csum[i] <- x[i]

  csum[i] <- max(csum[i-1] + x[i], x[i])
}
x; csum
#>  [1] -0.4689827 -0.2557522  0.1457067  0.8164156 -0.5966361  0.7967794
#>  [7]  0.8893505  0.3215956  0.2582281 -0.8764275
#>  [1] -0.4689827 -0.2557522  0.1457067  0.9621223  0.3654862  1.1622655
#>  [7]  2.0516161  2.3732117  2.6314397  1.7550123

Created on 2020-04-27 by the reprex package (v0.3.0)

But is there a way of doing this which avoids a for-loop? I've been banging my head trying to think of one but just can't.

If relevant, my real case will apply this to a dataframe. It will be grouped, and then I'll create a new column with this cumulative sum for each group. But I'm comfortable with that part, I just can't figure out how to clean up this operation.


Solution

  • We can use Reduce in base R

    csum2 <- Reduce(function(u, v) max(u + v, v), x, accumulate = TRUE)
    

    -checking with OP's output

    identical(csum, csum2)
    #[1] TRUE
    

    Or another option is accumulate from purrr

    library(purrr)
    accumulate(x, ~  max(.x + .y, .y))