I need to loop over IDs in a dataframe to fill NA values in a column by attributing empty cells evenly between the last and first filled entry outside of the NA cells.
ID Value X Y
1 A x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 B x y
2 C x y
2 NA x y
2 NA x y
2 NA x y
2 NA x y
2 D x y
Which should be filled to this:
ID Value X Y
1 A x y
1 A x y
1 A x y
1 B x y
1 B x y
1 B x y
1 B x y
2 C x y
2 C x y
2 C x y
2 D x y
2 D x y
2 D x y
In case of 2n NA values between observations, n is attributed to the last and n to the next. In case of 2n+1 values, n is attributed to the last and n+1 to the next.
I know I need to use na.locf
from the zoo
package which works well with a large database for filling in empty values based on the last non-empty cell, along with the fromLast
argument to perform "next observation carried backwards". I cannot however structure a loop to account for an even or odd number of NA values, and use both of these together.
Using the tidyverse package,
> library(tidyr)
> library(dplyr)
> df %>% dplyr::group_by(test$id) %>% fill(Value, .direction ="downup") %>% dplyr::ungroup()
This fills in NA values in both directions but does not account for different border values for NA cells in a group.
Define interp which replaces each successive non-NA with successive integers, applies na.appro9x, rounds and replaces the resulting integers with the original values.
library(zoo)
interp <- function(x) {
x0 <- ifelse(is.na(x), NA, cumsum(!is.na(x)))
xx <- na.approx(x0, rule = 2)
na.omit(x)[round(xx)]
}
transform(DF, Value = interp(Value))
giving:
ID Value X Y
1 1 A x y
2 1 A x y
3 1 A x y
4 1 B x y
5 1 B x y
6 1 B x y
7 1 B x y
8 2 C x y
9 2 C x y
10 2 C x y
11 2 D x y
12 2 D x y
13 2 D x y
It is assumed that the input is the following, shown in reproducible form.
Lines <- "ID Value X Y
1 A x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 B x y
2 C x y
2 NA x y
2 NA x y
2 NA x y
2 NA x y
2 D x y"
DF <- read.table(text = Lines, header = TRUE)