Search code examples
rdatatablenashift

Shifting the last non-NA value by id


I have a data table that looks like this:

DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))

   day Consumption id
1:   1           5  1
2:   2           9  2
3:   3          10  3
4:   4           2  1
5:   5          NA  1
6:   6          NA  2
7:   7          NA  2
8:   8          NA  1

I want to create two columns that show the last non-Na consumption value before the observation, and the day difference between those observations using the id groups. So far, I tried this:

DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  1 NA
8:   8          NA  1  3 NA 

However, I want that the lagged consumption values with n=1 come from the rows which have non-NA consumption values. For example, in the 7th row and column "yj", the yj value is NA because it comes from the 6th row which has NA consumption. I want it to come from the 2nd row. Therefore, I would like the end up with this data table:

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  5  9
8:   8          NA  1  4  2

Note: The reason for specifically using the parameter n of shift function is that I will also need the 2nd last non-Na consumption values in the next step.

Thank You


Solution

  • Here's a solution with an assist from :

    library(data.table)
    library(zoo)
    
    DT[, `:=`(day_shift = shift(day),
              yj = shift(Consumption)),
       by = id]
    
    #make the NA yj records NA for the days
    DT[is.na(yj), day_shift := NA_integer_]
    
    #fill the DT with the last non-NA value
    DT[,
       `:=`(day_shift = na.locf(day_shift, na.rm = F),
              yj = zoo::na.locf(yj, na.rm = F)),
       by = id]
    
    # finally calculate j
    DT[, j:= day - day_shift]
    
    # you can clean up the ordering or remove columns later
    DT
    
       day Consumption id day_shift yj  j
    1:   1           5  1        NA NA NA
    2:   2           9  2        NA NA NA
    3:   3          10  3        NA NA NA
    4:   4           2  1         1  5  3
    5:   5          NA  1         4  2  1
    6:   6          NA  2         2  9  4
    7:   7          NA  2         2  9  5
    8:   8          NA  1         4  2  4