I am trying to create new variables in my data set that are cumulative totals which restart based on other variables (using group by)… I want these to be new columns in the data set and this is the part I am struggling with...
Using the data below, I want to create cumulative Sale and Profit columns that will restart for every Product and Product_Cat grouping.
The below code partly gives me what I need, but the variables are not new variables, instead it overwrites the existing Sale/Profit... what am I getting wrong? I imagine this is simple haven't found anything.
Note: I'm using lapply as my real data set has 40+ varbs that I need to create calculations for.
DT <- setDT(Data)[,lapply(.SD, cumsum), by = .(Product,Product_Cat) ]
Data for example:
Product <- c('A','A','A','B','B','B','C','C','C')
Product_Cat <- c('S1','S1','S2','C1','C1','C1','D1','E1','F1')
Sale <- c(10,15,5,20,15,10,5,5,5)
Profit <- c(2,4,2,6,8,2,4,6,8)
Sale_Cum <- c(10,25,5,20,35,45,5,5,5)
Profit_Cum <- c(2,6,2,6,14,16,4,6,8)
Data <- data.frame(Product,Product_Cat,Sale,Profit)
Desired_Data <- data.frame(Product,Product_Cat,Sale,Profit,Sale_Cum,Profit_Cum)
library(data.table)
setDT(Data)
cols <- names(Data)[3:4]
Data[, paste0(cols, '_cumsum') := lapply(.SD, cumsum)
, by = .(Product, Product_Cat)
, .SDcols = cols]