Search code examples
rtime-seriesxtszoo

Fill NA in a time series only to a limited number


Is there a way we can fill NAs in a zoo or xts object with limited number of NAs forward. In other words like fill NAs up to 3 consecutive NAs, and then keep the NAs from the 4th value on until a valid number.

Something like this.

library(zoo)
x <- zoo(1:20, Sys.Date() + 1:20)
x[c(2:4, 6:10, 13:18)] <- NA
x

2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26 
         1         NA         NA         NA          5         NA         NA 
2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 2014-10-03 
        NA         NA         NA         11         12         NA         NA 
2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09 
        NA         NA         NA         NA         19         20

Desired output, will be something with variable n = 3 is

2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26 
         1         1         1        1          5         5        5 
2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 2014-10-03 
        5         NA         NA         11         12         12        12 
2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09 
        12         NA         NA         NA         19         20

I have tried lot of combination with na.locf(x, maxgap = 3) etc without much success. I can create a loop to get the desired output, I was wondering whether there is vectorized way of achieving this.

fillInTheBlanks <- function(v, n=3) {
  result <- v
  counter0 <- 1
  for(i in 2:length(v)) {
    value <- v[i]
    if (is.na(value)) {
      if (counter0 > n) {
        result[i] <- v[i]
      } else {  
        result[i] <- result[i-1]
        counter0 <- counter0 + 1
      } }   
    else {
      result[i] <- v[i] 
      counter0 <- 1
    }
  }
  return(result)
}

Thanks


Solution

  • Here's another way:

    l <- cumsum(! is.na(x))
    c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
    # [1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20
    

    edit: my previous answer required that x have no duplicates. The current answer does not.

    benchmarks

    x <- rep(x, length.out=1e4)
    
    plourde <- function(x) {
        l <- cumsum(! is.na(x))
        c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
    }
    
    agstudy <- function(x) {
        unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
               function(sx){
                 if(length(sx)>3) 
                   sx[2:4] <- rep(sx[1],3)
                 else sx <- rep(sx[1],length(sx))
                 sx
               }))
    }
    
    microbenchmark(plourde(x), agstudy(x))
    # Unit: milliseconds
    #        expr   min     lq median     uq   max neval
    #  plourde(x)  5.30  5.591  6.409  6.774 57.13   100
    #  agstudy(x) 16.04 16.249 16.454 17.516 20.64   100