Search code examples
rdata.tablelapplycumsumcumulative-sum

R loop/lapply, cumulative totals with group by


I am trying to create new variables in my data set that are cumulative totals which restart based on other variables (using group by)… I want these to be new columns in the data set and this is the part I am struggling with...

Using the data below, I want to create cumulative Sale and Profit columns that will restart for every Product and Product_Cat grouping.

The below code partly gives me what I need, but the variables are not new variables, instead it overwrites the existing Sale/Profit... what am I getting wrong? I imagine this is simple haven't found anything.

Note: I'm using lapply as my real data set has 40+ varbs that I need to create calculations for.

DT <- setDT(Data)[,lapply(.SD, cumsum), by = .(Product,Product_Cat) ]

Data for example:

Product <- c('A','A','A','B','B','B','C','C','C')
Product_Cat <- c('S1','S1','S2','C1','C1','C1','D1','E1','F1')
Sale <- c(10,15,5,20,15,10,5,5,5)
Profit <- c(2,4,2,6,8,2,4,6,8)
Sale_Cum <- c(10,25,5,20,35,45,5,5,5)
Profit_Cum <- c(2,6,2,6,14,16,4,6,8)

Data <- data.frame(Product,Product_Cat,Sale,Profit)
Desired_Data <- data.frame(Product,Product_Cat,Sale,Profit,Sale_Cum,Profit_Cum)

Solution

  • library(data.table)
    setDT(Data)
    
    cols <- names(Data)[3:4]
    
    Data[, paste0(cols, '_cumsum') := lapply(.SD, cumsum)
         , by = .(Product, Product_Cat) 
         , .SDcols = cols]