Here I have a data that looks like this:
year <- c(2000,2001,2002,2003,2005,2006,2007,2008,2009,2010)
x <- c(1,2,3,NA,5,NA,NA,NA,9,10)
dat <- data.frame(year, x)
NA
with the nearest neighbor according to the year variable.For example, The fourth place of the data (the first NA
) takes the value from its left neighbor rather than its right neighbor because its year "2003" is closer to "2002" instead of "2005"
NA
there when it does not have nearest nonNA neighbor.For example, the seventh place of the data (the third NA) will still be NA because it does not have non-NA neighbor.
After imputing, the resulting x should be 1, 2, 3, 3, 5, 5, NA, 9, 9, 10
One option would be to make use of case_when
from tidyverse
. Essentially, if the previous row has a closer year and is not NA
, then return x
from that row. If not, then choose the row below. Or if the year is closer above but there is an NA
, then return the row below. Then, same for if the row below has a closer year, but has an NA
, then return the row above. If a row does not have an NA
, then just return x
.
library(tidyverse)
dat %>%
mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
is.na(x) & is.na(lag(x)) ~ lead(x),
is.na(x) & is.na(lead(x)) ~ lag(x),
TRUE ~ x))
Output
year x
1 2000 1
2 2001 2
3 2002 3
4 2003 3
5 2005 5
6 2006 5
7 2007 NA
8 2008 9
9 2009 9
10 2010 10