Search code examples
rdata.tablecumsum

Cumulative sum in r based on another column excluding the current value for more than one column


I want to take cumulative sum of a column based on another column and the code below successfully does it. But additionally I need to exclude the current element.

library(data.table)

cat_var <- c("rock", "indie", "rock", "rock", "pop", "indie", "rock")
cat_var_2 <- c("blue", "green", "red", "red", "blue", "green", "blue")
target_var <- c(0, 0, 1, 1, 1, 1, 0)
df <- data.table("categorical_variable" = cat_var, "categorical_variable_2" = cat_var_2, "target_variable" =  target_var)

ave(df[,"target_variable"], df[,c("categorical_variable")], FUN=cumsum) 

For now I can take cumulative sum of the target_variable based on categorical_variable. I want to take cumulative sums of both categorical_variable and categorical_variable_2 in one piece of code that excludes the current value. Like this:

ave(df[,"target_variable"], df[,c("categorical_variable", "categorical_variable_2")], FUN=cumsum) 

Expected output is:

categorical_variable_transformed <- c(0, 0, 0, 1, 0, 0, 2)
categorical_variable_2_transformed <- c(0, 0, 0, 1, 0, 0, 1)
df$categorical_variable_transformed <- categorical_variable_transformed
df$categorical_variable_2_transformed <- categorical_variable_2_transformed

Solution

  • Try this:

    library(data.table)
    nms <- c("categorical_variable", "categorical_variable_2")
    df[, paste0(nms, "_transformed") :=
           lapply(nms, \(g) ave(target_variable, get(g), FUN = cumsum) - target_variable)]
    df
    #    categorical_variable categorical_variable_2 target_variable categorical_variable_transformed categorical_variable_2_transformed
    #                  <char>                 <char>           <num>                            <num>                              <num>
    # 1:                 rock                   blue               0                                0                                  0
    # 2:                indie                  green               0                                0                                  0
    # 3:                 rock                    red               1                                0                                  0
    # 4:                 rock                    red               1                                1                                  1
    # 5:                  pop                   blue               1                                0                                  0
    # 6:                indie                  green               1                                0                                  0
    # 7:                 rock                   blue               0                                2                                  1