Search code examples
rloopsoptimizationiterationknime

optimization of iterative calculation avoiding loops on R


I have to apply an iterative calculation on rows of a data.frame in R. The problem is that, for each row, the result depends on the results of previous calculation and previous rows.

I have implemented the solution using a loop like the following example:

example <- data.frame(flag_new = c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE),
                      percentage =sample(1:100,22)/100)
n.Row <- nrow(example)

# initialization
example$K <-0
example$R <-0
example$K[1] <-100
example$R[1] <-example$K[1]*example$percentage[1]

#loop
for(i in 2:n.Row){
  if(example$flag_new[i]){
    example$K[i] <-100

  } else {
    example$K[i] <-example$K[i-1]-example$R[i-1]
  }
  example$R[i] <- example$K[i]*example$percentage[i]
}

The problem is that the real code is very slow (expecially if I use it in R snippet on KNIME)

Is there any way to optimize the code in a more efficient R-like way? I tried to use the apply family but it doesn't seem to work in my case.

Thank you very much


Solution

  • Here is a data.table effort using the cumulative flag_new to group

    set.seed(1)
    example <- data.frame(flag_new = c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE),
                          percentage =sample(1:100,22)/100)    
    
    # initialization
    initK = 100
    
    # Copy to allow comparison to your code
    newd = example
    
    library(data.table)
    setDT(newd)[, Knew:= initK* c(1, cumprod(1 - percentage[-.N])), 
                                  by=cumsum(flag_new)][, Rnew:=Knew* percentage]
    

    Compare to the results after running the loop in your question

    all.equal(example$K, newd$Knew) 
    all.equal(example$R, newd$Rnew)
    

    By grouping the calculations to be done from the first TRUE till the next the calculations can be done without a loop.

    For example, using the first group the calculation can be done as

    d = example[1:8, ]
    d$K1 <- d$K* c(1, cumprod(1 - d$percentage[-length(d$percentage)]))
    d$R2 <- with(d, K1* percentage)
    

    This comes from as

    k[i] = k[i-1] - R[i-1] 
    k[i] = k[i-1] - k[i-1]* p[i-1] 
         = k[i-1](1 - p[i-1])
    So 
    k[2] = k[1]* (1-p[1])
    k[3] = k[2]* (1-p[2]) = k[1]* (1-p[1])* (1-p[2])
    k[4] = k[3]* (1-p[3]) = k[1]* (1-p[1])* (1-p[2])* (1-p[3])
    and so on..
    

    So just need a split, apply, combine method, to calculate these for each group for which I used data.table