Search code examples
rdatedata.tablelag

choose semi-last observations based on date in data.table in R


I have a data.table with dates in it (as factor variables). I am getting the lag values from these. How can I tell R to run the get the lag values only for the observations dated semi-last? In this case this would be start == "01.01.2015"?

example data:

ID <- rep("A5", 15)
product <- rep(c("prod1","prod2","prod3", "prod55", "prod4", "prod9", "prod83"),3)
start <- c(rep("01.01.2016", 3), rep("01.01.2015", 3), rep("01.01.2014",3),
           rep("01.01.2013",3), rep("01.01.2012",3))
prodID <- c(3,1,2,3,1,2,3,1,2,3,2,1,3,1,2)
mydata <- cbind(ID, product[1:15], start, prodID)
mydata <- as.data.table(mydata)

mydata[, (nameCols) := shift(.SD, 3, fill = "NA", "lead"), .SDcols= c("start", "V2"), by = "prodID"]

For now I have used this to get to my results:

mydata[start == "01.01.2015"]

The problem is that the semi-last date is not always the same date. I will be repeating this procedure many times and i want to avoid having to specify this by hand. Any ideas?


Solution

  • Convert the data to date object and sort to select semi-last date.

    library(data.table)
    
    mydata[, start := as.IDate(start, '%d.%m.%Y')]
    mydata[start == sort(unique(start), decreasing = TRUE)[2]]
    
    
    #   ID     V2      start prodID
    #1: A5 prod55 2015-01-01      3
    #2: A5  prod4 2015-01-01      1
    #3: A5  prod9 2015-01-01      2