Is there a good way to carry the last observation of a row both forward and backwards n times? example vector, to demonstrate:
Before change:
vector <- c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA)
After change, for n=2:
vector <- c(NA, NA, NA, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, NA, NA, 3, 3, 3, 3, 3, NA)
dplyr::fill()
doesn't seem to have a way to specify the number of filled rows, and zoo::na.locf()
has a locb option, but only if you do not specify the number of rows you would like filled.
If there is a way to do this such that the locb and locf could be specified to be two different values, eg, 1 and 3, that would be perfect for me. But if there's not an easy way to do that then just an locb and locf of a specified number of rows. Thanks for any help! I usually work in dplyr but will accept any sort of solution as this problem is really stumping me.
I think a simple for
loop would help you accomplish this cleanly:
roll <- function(x, n) {
idx <- which(!is.na(x))
for (i in idx) x[pmax(i - n, 0):pmin(i + n, length(x))] <- x[i]
return(x)
}
roll(vector, 2)
# [1] NA NA NA 1 1 1 1 1 2 2 2 2 2 NA NA 3 3 3 3 3 NA NA
The purpose of pmin
and pmax
is to preserve the length of your vector. For example, if you had a value in the last element and n = 2, you would not want to add two addition elements to your vector (See first column, last row of dplyr
example below).
This function can then be easily applied within dplyr
:
set.seed(123)
df <- replicate(5, sample(c(1:4, NA), 20, replace = T, prob = c(rep(0.02, 4), .92))) |>
data.frame()
library(dplyr)
df |>
mutate(across(where(is.numeric), ~ roll(.x, 2)))
# X1 X2 X3 X4 X5
# 1 NA NA NA NA NA
# 2 NA 1 NA NA NA
# 3 4 1 NA NA NA
# 4 4 1 NA NA NA
# 5 4 1 NA NA 1
# 6 4 1 NA NA 1
# 7 4 NA NA NA 1
# 8 NA NA NA NA 1
# 9 4 2 NA NA 1
# 10 4 2 NA NA NA
# 11 4 2 NA NA NA
# 12 4 2 NA NA NA
# 13 4 2 NA NA NA
# 14 NA NA NA NA NA
# 15 NA NA NA NA NA
# 16 NA NA NA NA NA
# 17 NA NA NA NA NA
# 18 4 NA NA NA NA
# 19 4 NA NA NA NA
# 20 4 NA NA NA NA
I think it is useful to note that later values take precedence. For example, if an n
is specified so that values carried forward and backwards overlap then later values will overwrite former values:
roll(vector, 3)
# [1] NA NA 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 NA
If you carry too far then values will be overwritten before their "turn" (here 2 is overwritten by 1 before it has a chance to be carried):
roll(vector, 5)
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3
These behaviors can be modified, but are the default with this function, FYI.