I have two data frames that look like: Df1: `
ID ID_T1 ID_T2 ID_T3 ID_T4
1 101 NA NA NA NA
2 102 11111 11111 11111 11111
3 103 22222 22222 NA NA
4 104 NA NA NA NA
5 105 44444 44445 NA NA
Note. NAs included and IDs sometimes differ across rows.
Df2 (or better a vector)
ID_T5
1 44445
2 11111
3 22222
4 99945
Note. Newly assigned IDs present.
I need the following output
ID ID_T1 ID_T2 ID_T3 ID_T4 ID_T5
1 101 NA NA NA NA NA
2 102 11111 11111 11111 11111 11111
3 103 22222 22222 NA NA 22222
4 104 NA NA NA NA NA
5 105 44444 44445 NA NA 44445
6 NA NA NA NA NA 99945
So the package needs to check across ID_T1—ID_T4 whether the value is already existing and add it as a separate column and add additional rows if the value is not present in the df1.
I would really appreciate your help.
I have already spent hours with dplyr and messed around with filter and mutate, without being successful.
A bit clumsy, but it works:
library(dplyr)
Df3 <- cross_join(Df1, Df2) |>
filter(ID_T5==ID_T1 | ID_T5==ID_T2 | ID_T5==ID_T3 | ID_T5==ID_T4)
bind_rows(Df3, anti_join(Df1, Df3)) |>
full_join(Df2) |>
arrange(ID)
ID ID_T1 ID_T2 ID_T3 ID_T4 ID_T5
1 101 NA NA NA NA NA
2 102 11111 11111 11111 11111 11111
3 103 22222 22222 NA NA 22222
4 104 NA NA NA NA NA
5 105 44444 44445 NA NA 44445
6 NA NA NA NA NA 99945
Data:
Df1:
structure(list(ID = 101:105, ID_T1 = c(NA, 11111, 22222, NA,
44444), ID_T2 = c(NA, 11111, 22222, NA, 44445), ID_T3 = c(NA,
11111, NA, NA, NA), ID_T4 = c(NA, 11111, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
Df2:
structure(list(ID_T5 = c(44445, 11111, 22222, 99945)), class = "data.frame", row.names = c(NA,
-4L))