Currently I am trying to impute values in a vector in R. The conditions of the imputation are.
# example one
input_one = c(1,NA,3,4,NA,6,NA,NA)
# example two
input_two = c(NA,NA,3,4,5,6,NA,NA)
# example three
input_three = c(NA,NA,3,4,NA,6,NA,NA)
I started out to write code to detect the values which can be imputed. But I got stuck with the following.
# incomplete function to detect the values
sapply(split(!is.na(input[c(rbind(which(is.na(c(input)))-1, which(is.na(c(input)))+1))]),
rep(1:(length(!is.na(input[c(which(is.na(c(input)))-1, which(is.na(c(input)))+1)]))/2), each = 2)), all)
This however only detects the NAs which might be imputable and it only works with example one. It is incomplete and unfortunately super hard to read and understand.
Any help with this would be highly appreciated.
We can use dplyr
s lag
and lead
functions for that:
input_three = c(NA,NA,3,4,NA,6,NA,NA)
library(dplyr)
ifelse(is.na(input_three) & lead(input_three) > lag(input_three),
(lag(input_three) + lead(input_three))/ 2,
input_three)
Retrurns:
[1] NA NA 3 4 5 6 NA NA
Explanation:
We use ifelse
which is the vectorized version of if
. I.e. everything within ifelse
will be applied to each element of the vectors.
First we test if the elements are NA
and if the following element is > than the previous. To get the previous and following element we can use dplyr
lead
and lag
functions:
lag
offsets a vector to the right (default is 1 step):
lag(1:5)
Returns:
[1] NA 1 2 3 4
lead
offsets a vector to the left:
lead(1:5)
Returns:
[1] 2 3 4 5 NA
Now to the 'test' clause of ifelse
:
is.na(input_three) & lead(input_three) > lag(input_three)
Which returns:
[1] NA NA FALSE FALSE TRUE FALSE NA NA
Then if the ifelse
clause evaluates to TRUE
we want to return the sum of the previous and following element divided by 2, othrwise return the original element