I have a data.table with dates in it (as factor variables). I am getting the lag values from these. How can I tell R to run the get the lag values only for the observations dated semi-last? In this case this would be start == "01.01.2015"
?
example data:
ID <- rep("A5", 15)
product <- rep(c("prod1","prod2","prod3", "prod55", "prod4", "prod9", "prod83"),3)
start <- c(rep("01.01.2016", 3), rep("01.01.2015", 3), rep("01.01.2014",3),
rep("01.01.2013",3), rep("01.01.2012",3))
prodID <- c(3,1,2,3,1,2,3,1,2,3,2,1,3,1,2)
mydata <- cbind(ID, product[1:15], start, prodID)
mydata <- as.data.table(mydata)
mydata[, (nameCols) := shift(.SD, 3, fill = "NA", "lead"), .SDcols= c("start", "V2"), by = "prodID"]
For now I have used this to get to my results:
mydata[start == "01.01.2015"]
The problem is that the semi-last date is not always the same date. I will be repeating this procedure many times and i want to avoid having to specify this by hand. Any ideas?
Convert the data to date object and sort
to select semi-last date.
library(data.table)
mydata[, start := as.IDate(start, '%d.%m.%Y')]
mydata[start == sort(unique(start), decreasing = TRUE)[2]]
# ID V2 start prodID
#1: A5 prod55 2015-01-01 3
#2: A5 prod4 2015-01-01 1
#3: A5 prod9 2015-01-01 2