I have a data table that looks like this:
DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))
day Consumption id
1: 1 5 1
2: 2 9 2
3: 3 10 3
4: 4 2 1
5: 5 NA 1
6: 6 NA 2
7: 7 NA 2
8: 8 NA 1
I want to create two columns that show the last non-Na consumption value before the observation, and the day difference between those observations using the id groups. So far, I tried this:
DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]
day Consumption id j yj
1: 1 5 1 NA NA
2: 2 9 2 NA NA
3: 3 10 3 NA NA
4: 4 2 1 3 5
5: 5 NA 1 1 2
6: 6 NA 2 4 9
7: 7 NA 2 1 NA
8: 8 NA 1 3 NA
However, I want that the lagged consumption values with n=1 come from the rows which have non-NA consumption values. For example, in the 7th row and column "yj", the yj value is NA because it comes from the 6th row which has NA consumption. I want it to come from the 2nd row. Therefore, I would like the end up with this data table:
day Consumption id j yj
1: 1 5 1 NA NA
2: 2 9 2 NA NA
3: 3 10 3 NA NA
4: 4 2 1 3 5
5: 5 NA 1 1 2
6: 6 NA 2 4 9
7: 7 NA 2 5 9
8: 8 NA 1 4 2
Note: The reason for specifically using the parameter n of shift function is that I will also need the 2nd last non-Na consumption values in the next step.
Thank You
Here's a data.table solution with an assist from zoo:
library(data.table)
library(zoo)
DT[, `:=`(day_shift = shift(day),
yj = shift(Consumption)),
by = id]
#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]
#fill the DT with the last non-NA value
DT[,
`:=`(day_shift = na.locf(day_shift, na.rm = F),
yj = zoo::na.locf(yj, na.rm = F)),
by = id]
# finally calculate j
DT[, j:= day - day_shift]
# you can clean up the ordering or remove columns later
DT
day Consumption id day_shift yj j
1: 1 5 1 NA NA NA
2: 2 9 2 NA NA NA
3: 3 10 3 NA NA NA
4: 4 2 1 1 5 3
5: 5 NA 1 4 2 1
6: 6 NA 2 2 9 4
7: 7 NA 2 2 9 5
8: 8 NA 1 4 2 4