Search code examples
rrecursiondplyrdata.tabledtplyr

How to apply a custom recursive function with data.table and loop over each index group-wise?


Since I can't find an answer in below questions:

Apply a recursive function over groups and rows without explicit for loop

How do I mimic the drag functionality for new rows such as in Excel but for R?

I'll try in asking a new question related to above. I.e, I want to apply a custom function recursively based on the output of previous values to the current row by group.

Example with a dataframe and a for loop:

for(i in 2:nrow(df1)) df1$z[i] <- df1$z[i-1] + df1$x[i-1] - df1$y[i-1] 

Example with a dataframe and a for loop with custom function:

for(i in 2:nrow(df1)) df1$z[i] <- ifelse(df1$z[i-1] == df1$z[i],
                                          df1$z[i-1] + df1$x[i-1] - df1$y[i-1],
                                          df1$z[i-1] - df1$x[i-1] - df1$y[i-1])

However, with >1 mill rows, data.frames and for-loops are not optimal.

Is there any way to do above with data.table or dtplyr and optimized but also group-wise?

EDIT: See visualization of question.

It should first initiate from 2nd row like in for(i in 2:nrow(df) and it should use the custom function if and only if group[i]==group[i-1] enter image description here


Solution

  • Does this use of Reduce do the trick?

    tmp = data.table(
      grp = c(rep(0,6), rep(1,6)),
      x=c(10,20,30,40,50,60,1,2,3,4,5,6),
      y=c(1,2,3,4,5,6, 10,20,30,40,50,60)
    )
    tmp[, z:=Reduce(f=function(z,i) z + x[i-1] - y[i-1],
                    x=(1:.N)[-1],
                    init=0,
                    accumulate = T)
        ,by=grp
    ]
    

    Output:

        grp  x  y    z
     1:   0 10  1    0
     2:   0 20  2    9
     3:   0 30  3   27
     4:   0 40  4   54
     5:   0 50  5   90
     6:   0 60  6  135
     7:   1  1 10    0
     8:   1  2 20   -9
     9:   1  3 30  -27
    10:   1  4 40  -54
    11:   1  5 50  -90
    12:   1  6 60 -135
    

    Take for example, row 4. The value in the z column is 54, which is equal to the prior row's z-value + prior row's x-value, minus prior row's y-value.

    The function f within Reduce can take any complicated form, including ifelse statements. Here is an example, where I've made a function called func, which is a wrapper around Reduce. Notice that within the Reduce statement, f is a function taking prev (thanks to suggestion by @r2evans), and this function first calculates previous row's s value minus previous row's t value (this is akin to your x[-1]-y[-1]. Then there is an ifelse statement. If the difference between the prior rows s and t value (i.e. k) is >20, then the new value in this row will be the previous z value minus the product of 20-4k (i.e. prev-(20-4k)), otherwise it will the previous z value + k (i.e. which is equal to your original formulation: z[i-1]+x[i-1]-y[i-1])

    func <- function(s,t) {
      Reduce(
        f=function(prev,i) {
          k=s[i-1] - t[i-1]
          ifelse(k>10, prev -(20-4*k), prev+k)
        },
        x=2:length(s),
        init=0,
        accumulate = TRUE
      )
    }
    

    You can then assign the value of the func(x,y) to z, like this:

    tmp[, z:=func(x,y), by=.(grp)][]
    

    Output:

        grp  x  y    z
     1:   0 10  1    0
     2:   0 20  2    9
     3:   0 30  3   61
     4:   0 40  4  149
     5:   0 50  5  273
     6:   0 60  6  433
     7:   1  1 10    0
     8:   1  2 20   -9
     9:   1  3 30  -27
    10:   1  4 40  -54
    11:   1  5 50  -90
    12:   1  6 60 -135