Since I can't find an answer in below questions:
Apply a recursive function over groups and rows without explicit for loop
How do I mimic the drag functionality for new rows such as in Excel but for R?
I'll try in asking a new question related to above. I.e, I want to apply a custom function recursively based on the output of previous values to the current row by group.
Example with a dataframe and a for loop:
for(i in 2:nrow(df1)) df1$z[i] <- df1$z[i-1] + df1$x[i-1] - df1$y[i-1]
Example with a dataframe and a for loop with custom function:
for(i in 2:nrow(df1)) df1$z[i] <- ifelse(df1$z[i-1] == df1$z[i],
df1$z[i-1] + df1$x[i-1] - df1$y[i-1],
df1$z[i-1] - df1$x[i-1] - df1$y[i-1])
However, with >1 mill rows, data.frames and for-loops are not optimal.
Is there any way to do above with data.table
or dtplyr
and optimized but also group-wise?
EDIT: See visualization of question.
It should first initiate from 2nd row like in for(i in 2:nrow(df)
and it should use the custom function if and only if group[i]==group[i-1]
Does this use of Reduce
do the trick?
tmp = data.table(
grp = c(rep(0,6), rep(1,6)),
x=c(10,20,30,40,50,60,1,2,3,4,5,6),
y=c(1,2,3,4,5,6, 10,20,30,40,50,60)
)
tmp[, z:=Reduce(f=function(z,i) z + x[i-1] - y[i-1],
x=(1:.N)[-1],
init=0,
accumulate = T)
,by=grp
]
Output:
grp x y z
1: 0 10 1 0
2: 0 20 2 9
3: 0 30 3 27
4: 0 40 4 54
5: 0 50 5 90
6: 0 60 6 135
7: 1 1 10 0
8: 1 2 20 -9
9: 1 3 30 -27
10: 1 4 40 -54
11: 1 5 50 -90
12: 1 6 60 -135
Take for example, row 4. The value in the z column is 54, which is equal to the prior row's z-value + prior row's x-value, minus prior row's y-value.
The function f
within Reduce can take any complicated form, including ifelse
statements. Here is an example, where I've made a function called func
, which is a wrapper around Reduce
. Notice that within the Reduce statement, f
is a function taking prev
(thanks to suggestion by @r2evans), and this function first calculates previous row's s
value minus previous row's t
value (this is akin to your x[-1]-y[-1]
. Then there is an ifelse
statement. If the difference between the prior rows s
and t
value (i.e. k
) is >20, then the new value in this row will be the previous z
value minus the product of 20-4k (i.e. prev-(20-4k)
), otherwise it will the previous z
value + k
(i.e. which is equal to your original formulation: z[i-1]+x[i-1]-y[i-1]
)
func <- function(s,t) {
Reduce(
f=function(prev,i) {
k=s[i-1] - t[i-1]
ifelse(k>10, prev -(20-4*k), prev+k)
},
x=2:length(s),
init=0,
accumulate = TRUE
)
}
You can then assign the value of the func(x,y)
to z, like this:
tmp[, z:=func(x,y), by=.(grp)][]
Output:
grp x y z
1: 0 10 1 0
2: 0 20 2 9
3: 0 30 3 61
4: 0 40 4 149
5: 0 50 5 273
6: 0 60 6 433
7: 1 1 10 0
8: 1 2 20 -9
9: 1 3 30 -27
10: 1 4 40 -54
11: 1 5 50 -90
12: 1 6 60 -135