Search code examples
rimputation

how to impute the distance to a value


I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?

#    x y
#1   0 0
#2  NA 1
#3   0 0
#4  NA 1
#5  NA 2
#6  NA 1
#7   0 0
#8  NA 1
#9  NA 2
#10 NA 3
#11 NA 2
#12 NA 1
#13  0 0

I can't seem to find the right combination of dplyr group_by and mutate row_number() statements to do the trick. The various imputation packages that I've investigated are designed for more complicated scenarios where imputation is performed using statistics and other variables.

d<-data.frame(x=c(0,NA,0,rep(NA,3),0,rep(NA,5),0),y=c(0,1,0,1,2,1,0,1,2,3,2,1,0))

Solution

  • We can use

    d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
    #     x y z
    # 1   0 0 0
    # 2  NA 1 1
    # 3   0 0 0
    # 4  NA 1 1
    # 5  NA 2 2
    # 6  NA 1 1
    # 7   0 0 0
    # 8  NA 1 1
    # 9  NA 2 2
    # 10 NA 3 3
    # 11 NA 2 2
    # 12 NA 1 1
    # 13  0 0 0
    

    If you want to do this in dplyr, you can just wrap the sapply part in a mutate.

    d %>%
       mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))
    

    or, using also library(purrr) (thanks to @Onyambu):

    d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))