Search code examples
rcumulative-sum

Cumulative sum for each row of data up to 100, then restart cumsum on next row of dataframe (row 1,2,3 etc.)


I am trying to calculate the cumulative sum of a column in a dataframe that is greater than or equal to 100, and then restart the calculation on the next row of the dataframe. I want to run summary statistics on each grouping of data but for the life of me cannot figure out a straightforward method to move onto the next row in the dataframe on which to begin the next calculation rather than at the row where the iteration reached 100.

df = data.frame(unit_no = seq(1,10,1), length_m = round(runif(10, 20, 60)))

df
  unit_no length_m
1        1       39
2        2       55
3        3       36
4        4       49
5        5       55
6        6       21
7        7       22
8        8       59
9        9       31
10      10       35

test = as.data.frame(NULL)
temp = 0

for (i in 1:nrow(df)){
  temp = temp + df[i, "length_m"] 
  
  if (temp >=100){
    
    test = rbind(test, temp)
    temp = 0
    
  }
}

and the resulting dataframe that I am trying to get would be:

  unit_start_end sum_length
1            1-3        130
2            2-4        140
3            3-5        140
4            4-5        104
5            5-8        157
6            6-8        102
7            7-9        112
8           8-10        125

Where the range of units are recorded where the value is >=100. Right now the code above calculates >=100 but does not start back to the next row of data in sequence.

Any help is greatly appreciated!


Solution

  • Here's a solution in base R:

    do.call("rbind", lapply(seq(nrow(df)), function(i) {
      ind <- i + 0:(sum(cumsum(df$length_m[i:nrow(df)]) < 100))
      if(any(ind > nrow(df))) return(NULL)
      data.frame(unit_start_end = paste(min(ind), max(ind), sep = " - "),
                 sum_len = sum(df$length_m[ind]))
    }))
    #>   unit_start_end sum_len
    #> 1          1 - 3     130
    #> 2          2 - 4     140
    #> 3          3 - 5     140
    #> 4          4 - 5     104
    #> 5          5 - 8     157
    #> 6          6 - 8     102
    #> 7          7 - 9     112
    #> 8         8 - 10     125
    

    Created on 2023-08-06 with reprex v2.0.2