Search code examples
rrolling-computationaccumulate

Fast way to calculate value in cell based on value in previous row (data.table)


Say I have the following dataset dt and a constant constant.

dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072, 
                              -18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA)), class = c("data.table", 
                                                   "data.frame"), row.names = c(NA, -6L))

constant <- 608383

print(dt)
         var1  var2
1: -92186.747 -3.12
2: -19163.504    NA
3: -18178.840    NA
4:  -9844.679    NA
5: -16494.780    NA
6: -17088.058    NA

The value of var2 depends on the value of the previous row, like so

for(i in 2:nrow(dt)){
  prev <- dt[(i-1),]
  dt[i, var2 := prev$var2 - var1/constant]
}

print(dt)
         var1      var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4:  -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

Though this for loop provides the desired output, it is very slow on a big dataset. Is there a faster way to achieve this output? Preferably using data.table.


Solution

  • Here is a solution using accumulate function from purrr package in case you were interested. In this solution .y represents the current value of var1 that we would like to iterate over and .x represents the accumulated value that we calculate and put in var2 column. As you might have noticed I excluded the first value of var1 as it we don't apply our formula on it.

    library(dplyr)
    library(purrr)
    
    dt %>%
      mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))
    
    
             var1      var2
    1: -92186.747 -3.120000
    2: -19163.504 -3.088501
    3: -18178.840 -3.058620
    4:  -9844.679 -3.042439
    5: -16494.780 -3.015326
    6: -17088.058 -2.987238