I have a data.frame with a column filled with some NAs, and ids.
x <- c(NA, NA, NA, NA, "A", NA, NA, "B", "A", NA, NA, NA, NA, "B", NA, "A")
id <- rep(1:3, c(8, 6, 2))
df <- data.frame(id = id, x = x)
# id x
# 1 1 <NA>
# 2 1 <NA>
# 3 1 <NA>
# 4 1 <NA>
# 5 1 A
# 6 1 <NA>
# 7 1 <NA>
# 8 1 B
# 9 2 A
# 10 2 <NA>
# 11 2 <NA>
# 12 2 <NA>
# 13 2 <NA>
# 14 2 B
# 15 3 <NA>
# 16 3 A
I'd like to fill the x
column by id
. But the fill should be of different lengths depending on whether "A"
or "B"
are being used.
For "A"
, I'd like the 2 NAs before to be replaced by "A"
, for "B"
, the 3 NAs before, so that the expected output should be:
# id x
# 1 1 <NA>
# 2 1 <NA>
# 3 1 A
# 4 1 A
# 5 1 A
# 6 1 B
# 7 1 B
# 8 1 B
# 9 2 A
# 10 2 <NA>
# 11 2 B
# 12 2 B
# 13 2 B
# 14 2 B
# 15 3 A
# 16 3 A
I use this when I have the same number of NAs to be filled, but I'm not sure how to transpose this into 2 different values depending on x
. (I tried with ifelse, without success).
setDT(df)[, x := fcoalesce(shift(x, -(0:4))), by = "id"]
#With ifelse?
setDT(df)[, x := ifelse(x == "A", fcoalesce(shift(x, -(0:2))), fcoalesce(shift(x, -(0:1)))), by = "id"]
Fast solutions, using data.table
or collapse
are preferred. This will be used in a data.frame of over 8M rows and 100K groups.
I think this will work:
library(data.table)
setDT(df)[, x := fifelse(fcoalesce(shift(x, -(0:3))) == "B", "B", fcoalesce(shift(x, -(0:2)))), by = "id"][]
id x
1: 1 <NA>
2: 1 <NA>
3: 1 A
4: 1 A
5: 1 A
6: 1 B
7: 1 B
8: 1 B
9: 2 A
10: 2 <NA>
11: 2 B
12: 2 B
13: 2 B
14: 2 B
15: 3 A
16: 3 A