Let's assume I have a dataset df wherein I want to match value of col X
across values of multiple columns A to F
and want to return the matching column name (else NA) in a new column.
dput
df <- structure(list(A = c(4L, NA, NA, NA), B = c(NA, 5L, NA, NA),
C = c(NA, NA, NA, NA), D = c(NA, 4L, 6L, 7L), E = c(5L, NA,
NA, NA), F = c(NA, NA, NA, NA), X = 4:7), class = "data.frame", row.names = c(NA,
-4L))
> df
A B C D E F X
1 4 NA NA NA 5 NA 4
2 NA 5 NA 4 NA NA 5
3 NA NA NA 6 NA NA 6
4 NA NA NA 7 NA NA 7
The output I want
> df_out
A B C D E F X new
1 4 NA NA NA 5 NA 4 A
2 NA 5 NA 4 NA NA 5 B
3 NA NA NA 6 NA NA 6 D
4 NA NA NA 7 NA NA 7 D
I will prefer dplyr
/tidyverse
syntax that I'll integrate into my existing syntax.
One option could be:
df %>%
rowwise() %>%
mutate(new = names(.)[which(c_across(-X) %in% X)])
A B C D E F X new
<int> <int> <lgl> <int> <int> <lgl> <int> <chr>
1 4 NA NA NA 5 NA 4 A
2 NA 5 NA 4 NA NA 5 B
3 NA NA NA 6 NA NA 6 D
4 NA NA NA 7 NA NA 7 D
The solution above assumes that the column names correspond to the positions established by which()
. However, if it is not the case (e.g. c_across(-c(C, E, X)
), then the results won't be correct. A solution for more complex situations could be:
df %>%
mutate(new = Reduce(coalesce, across(-c(C, E, X), ~ ifelse(. == X, cur_column(), NA_character_))))
A B C D E F X new
1 4 NA NA NA 5 NA 4 A
2 NA 5 NA 4 NA NA 5 B
3 NA NA NA 6 NA NA 6 D
4 NA NA NA 7 NA NA 7 D