Search code examples
rfunctiondata.tablelag

`data.table` in R - creating columns and using functions


I often find myself needing to calculate the lag percentage change of things and was wondering how my current approach could be simplified ?

At the moment, i am creating two new columns in my data - in each of the newly created columns i calculated the percentage change from previous year using the function lag. I prefer to keep working in data.table, as my data will be quite large and I for me the the non standard evaluation is much straight forward than in dplyr. Anyway, below is my (example) data.

How can the function be used inside the data.table (do in do not repeat myself that much)?

year <- c(2012, 2013, 2014, 2015)
value <- c (22,33,44,55)
amount <- c(99, 88, 77, 66)

mydata <- cbind(year, value, amount)
mydata <- as.data.table(mydata)

getPctLag(mydata$value)

mydata <- mydata[ , ':=' (value_pct = paste0(round((value/lag(value) - 1) * 100, digits = 3) , " %"),
                          amount_pct = paste0(round((amount/lag(amount) - 1) * 100, digits = 3) , " %"))]

getPctLag <- function(x){lag_pct = paste0(round((x/lag(x) - 1) * 100, digits = 3) , "%")}

Solution

  • You could specify columns to which you want to apply function in .SDcols. Also since you are using data.table it is better to use shift because lag is from dplyr.

    library(data.table)
    
    getPctLag <- function(x)  paste(round((x/shift(x) - 1) * 100, digits = 3) , "%")
    cols <- c("value", "amount")
    
    mydata[, paste0(cols, "pct") := lapply(.SD, getPctLag), .SDcols = cols]
    mydata
    
    #   year value amount value_pct amount_pct
    #1: 2012    22     99       NA%        NA%
    #2: 2013    33     88       50%   -11.111%
    #3: 2014    44     77   33.333%     -12.5%
    #4: 2015    55     66       25%   -14.286%