I often find myself needing to calculate the lag percentage change of things and was wondering how my current approach could be simplified ?
At the moment, i am creating two new columns in my data - in each of the newly created columns i calculated the percentage change from previous year using the function lag
. I prefer to keep working in data.table
, as my data will be quite large and I for me the the non standard evaluation is much straight forward than in dplyr
. Anyway, below is my (example) data.
How can the function be used inside the data.table (do in do not repeat myself that much)?
year <- c(2012, 2013, 2014, 2015)
value <- c (22,33,44,55)
amount <- c(99, 88, 77, 66)
mydata <- cbind(year, value, amount)
mydata <- as.data.table(mydata)
getPctLag(mydata$value)
mydata <- mydata[ , ':=' (value_pct = paste0(round((value/lag(value) - 1) * 100, digits = 3) , " %"),
amount_pct = paste0(round((amount/lag(amount) - 1) * 100, digits = 3) , " %"))]
getPctLag <- function(x){lag_pct = paste0(round((x/lag(x) - 1) * 100, digits = 3) , "%")}
You could specify columns to which you want to apply function in .SDcols
. Also since you are using data.table
it is better to use shift
because lag
is from dplyr
.
library(data.table)
getPctLag <- function(x) paste(round((x/shift(x) - 1) * 100, digits = 3) , "%")
cols <- c("value", "amount")
mydata[, paste0(cols, "pct") := lapply(.SD, getPctLag), .SDcols = cols]
mydata
# year value amount value_pct amount_pct
#1: 2012 22 99 NA% NA%
#2: 2013 33 88 50% -11.111%
#3: 2014 44 77 33.333% -12.5%
#4: 2015 55 66 25% -14.286%