Search code examples
rnazoolocf

Change maxgap for number of times a value is carried forward


I have a data frame similar to the following:

library(data.table)
test <- data.table(data.frame("value" = c(5,NA,8,NA,NA,8,6,NA,NA,10),
                              "locf_N" = c(1,NA,1,NA,NA,1,2,NA,NA,2)) )

In this data frame I have a variable that indicates the times I could carry forward the last observation (locf_N). This is not a fixed number for all observations. I have tried to use the maxgap parameter in the na.locf function for this purpose but it is not actually what I am looking for.

require(zoo)
test[,value := na.locf(value, na.rm = FALSE, maxgap = 1)]
test[,value := na.locf(value, na.rm = FALSE, maxgap = locf_N)]

Is there any parameter to set the number of times the last observation can be carried forward? Any ideas welcome.

Desired output:

output <- data.table(data.frame("value" = c(5,5,8,8,NA,8,6,6,6,10),
                                "locf_N" = c(1,NA,1,NA,NA,1,2,NA,NA,2)) )

Solution

  • cumsum(!is.na(value)) is a grouping vector that groups each non-NA with the following NAs. Then for each such group repeat the first value the required number of times and leave the remaining values as NA.

    test[, list(value = replace(value, 1:min(.N, locf_N[1] + 1), value[1]), locf_N), 
            by = cumsum(!is.na(value))][, -1]
    

    giving:

        value locf_N
     1:     5      1
     2:     5     NA
     3:     8      1
     4:     8     NA
     5:    NA     NA
     6:     8      1
     7:     6      2
     8:     6     NA
     9:     6     NA
    10:    10      2