I want to take cumulative sum of a column based on another column and the code below successfully does it. But additionally I need to exclude the current element.
library(data.table)
cat_var <- c("rock", "indie", "rock", "rock", "pop", "indie", "rock")
cat_var_2 <- c("blue", "green", "red", "red", "blue", "green", "blue")
target_var <- c(0, 0, 1, 1, 1, 1, 0)
df <- data.table("categorical_variable" = cat_var, "categorical_variable_2" = cat_var_2, "target_variable" = target_var)
ave(df[,"target_variable"], df[,c("categorical_variable")], FUN=cumsum)
For now I can take cumulative sum of the target_variable
based on categorical_variable
. I want to take cumulative sums of both categorical_variable
and categorical_variable_2
in one piece of code that excludes the current value. Like this:
ave(df[,"target_variable"], df[,c("categorical_variable", "categorical_variable_2")], FUN=cumsum)
Expected output is:
categorical_variable_transformed <- c(0, 0, 0, 1, 0, 0, 2)
categorical_variable_2_transformed <- c(0, 0, 0, 1, 0, 0, 1)
df$categorical_variable_transformed <- categorical_variable_transformed
df$categorical_variable_2_transformed <- categorical_variable_2_transformed
Try this:
library(data.table)
nms <- c("categorical_variable", "categorical_variable_2")
df[, paste0(nms, "_transformed") :=
lapply(nms, \(g) ave(target_variable, get(g), FUN = cumsum) - target_variable)]
df
# categorical_variable categorical_variable_2 target_variable categorical_variable_transformed categorical_variable_2_transformed
# <char> <char> <num> <num> <num>
# 1: rock blue 0 0 0
# 2: indie green 0 0 0
# 3: rock red 1 0 0
# 4: rock red 1 1 1
# 5: pop blue 1 0 0
# 6: indie green 1 0 0
# 7: rock blue 0 2 1