Search code examples
rdplyrlocf

R last observation carried forwards and backwards up to n rows


Is there a good way to carry the last observation of a row both forward and backwards n times? example vector, to demonstrate:

Before change:

vector <- c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA)

After change, for n=2:

vector <- c(NA, NA, NA, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, NA, NA, 3, 3, 3, 3, 3, NA)

dplyr::fill() doesn't seem to have a way to specify the number of filled rows, and zoo::na.locf() has a locb option, but only if you do not specify the number of rows you would like filled.

If there is a way to do this such that the locb and locf could be specified to be two different values, eg, 1 and 3, that would be perfect for me. But if there's not an easy way to do that then just an locb and locf of a specified number of rows. Thanks for any help! I usually work in dplyr but will accept any sort of solution as this problem is really stumping me.


Solution

  • I think a simple for loop would help you accomplish this cleanly:

    roll <- function(x, n) {
      idx <- which(!is.na(x))
      for (i in idx) x[pmax(i - n, 0):pmin(i + n, length(x))] <- x[i]
      return(x)
    }
    
    roll(vector, 2)
    # [1] NA NA NA  1  1  1  1  1  2  2  2  2  2 NA NA  3  3  3  3  3 NA NA
    

    The purpose of pmin and pmax is to preserve the length of your vector. For example, if you had a value in the last element and n = 2, you would not want to add two addition elements to your vector (See first column, last row of dplyr example below).

    This function can then be easily applied within dplyr:

    set.seed(123)
    df <- replicate(5, sample(c(1:4, NA), 20, replace = T, prob = c(rep(0.02, 4), .92))) |>
      data.frame()
    
    library(dplyr)
    
    df |>
      mutate(across(where(is.numeric), ~ roll(.x, 2)))
    #    X1 X2 X3 X4 X5
    # 1  NA NA NA NA NA
    # 2  NA  1 NA NA NA
    # 3   4  1 NA NA NA
    # 4   4  1 NA NA NA
    # 5   4  1 NA NA  1
    # 6   4  1 NA NA  1
    # 7   4 NA NA NA  1
    # 8  NA NA NA NA  1
    # 9   4  2 NA NA  1
    # 10  4  2 NA NA NA
    # 11  4  2 NA NA NA
    # 12  4  2 NA NA NA
    # 13  4  2 NA NA NA
    # 14 NA NA NA NA NA
    # 15 NA NA NA NA NA
    # 16 NA NA NA NA NA
    # 17 NA NA NA NA NA
    # 18  4 NA NA NA NA
    # 19  4 NA NA NA NA
    # 20  4 NA NA NA NA
    

    I think it is useful to note that later values take precedence. For example, if an n is specified so that values carried forward and backwards overlap then later values will overwrite former values:

    roll(vector, 3)
    # [1] NA NA  1  1  1  1  1  2  2  2  2  2  2  2  3  3  3  3  3  3  3 NA
    

    If you carry too far then values will be overwritten before their "turn" (here 2 is overwritten by 1 before it has a chance to be carried):

    roll(vector, 5)
    # [1] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3
    

    These behaviors can be modified, but are the default with this function, FYI.