Search code examples
rgroupingnafill

Fill NAs conditional on value


I have a data.frame with a column filled with some NAs, and ids.

x <- c(NA, NA, NA, NA, "A", NA, NA, "B", "A", NA, NA, NA, NA, "B", NA, "A")  
id <- rep(1:3, c(8, 6, 2))
df <- data.frame(id = id, x = x)

#    id    x
# 1   1 <NA>
# 2   1 <NA>
# 3   1 <NA>
# 4   1 <NA>
# 5   1    A
# 6   1 <NA>
# 7   1 <NA>
# 8   1    B
# 9   2    A
# 10  2 <NA>
# 11  2 <NA>
# 12  2 <NA>
# 13  2 <NA>
# 14  2    B
# 15  3 <NA>
# 16  3    A

I'd like to fill the x column by id. But the fill should be of different lengths depending on whether "A" or "B" are being used.

For "A", I'd like the 2 NAs before to be replaced by "A", for "B", the 3 NAs before, so that the expected output should be:

#    id    x
# 1   1 <NA>
# 2   1 <NA>
# 3   1    A
# 4   1    A
# 5   1    A
# 6   1    B
# 7   1    B
# 8   1    B
# 9   2    A
# 10  2 <NA>
# 11  2    B
# 12  2    B
# 13  2    B
# 14  2    B
# 15  3    A
# 16  3    A

I use this when I have the same number of NAs to be filled, but I'm not sure how to transpose this into 2 different values depending on x. (I tried with ifelse, without success).

setDT(df)[, x := fcoalesce(shift(x, -(0:4))), by = "id"]

#With ifelse?
setDT(df)[, x := ifelse(x == "A", fcoalesce(shift(x, -(0:2))), fcoalesce(shift(x, -(0:1)))), by = "id"]

Fast solutions, using data.table or collapse are preferred. This will be used in a data.frame of over 8M rows and 100K groups.


Solution

  • I think this will work:

    library(data.table)
    setDT(df)[, x := fifelse(fcoalesce(shift(x, -(0:3))) == "B", "B", fcoalesce(shift(x, -(0:2)))), by = "id"][]
    
        id    x
     1:  1 <NA>
     2:  1 <NA>
     3:  1    A
     4:  1    A
     5:  1    A
     6:  1    B
     7:  1    B
     8:  1    B
     9:  2    A
    10:  2 <NA>
    11:  2    B
    12:  2    B
    13:  2    B
    14:  2    B
    15:  3    A
    16:  3    A