Search code examples
rmatchtidyverseacross

How to match a column across multiple columns and return matching col_name in a new column


Let's assume I have a dataset df wherein I want to match value of col X across values of multiple columns A to F and want to return the matching column name (else NA) in a new column.

dput

df <- structure(list(A = c(4L, NA, NA, NA), B = c(NA, 5L, NA, NA), 
    C = c(NA, NA, NA, NA), D = c(NA, 4L, 6L, 7L), E = c(5L, NA, 
    NA, NA), F = c(NA, NA, NA, NA), X = 4:7), class = "data.frame", row.names = c(NA, 
-4L))

> df
   A  B  C  D  E  F X
1  4 NA NA NA  5 NA 4
2 NA  5 NA  4 NA NA 5
3 NA NA NA  6 NA NA 6
4 NA NA NA  7 NA NA 7

The output I want

> df_out
   A  B  C  D  E  F X new
1  4 NA NA NA  5 NA 4   A
2 NA  5 NA  4 NA NA 5   B
3 NA NA NA  6 NA NA 6   D
4 NA NA NA  7 NA NA 7   D

I will prefer dplyr/tidyverse syntax that I'll integrate into my existing syntax.


Solution

  • One option could be:

    df %>%
     rowwise() %>%
     mutate(new = names(.)[which(c_across(-X) %in% X)])
    
          A     B C         D     E F         X new  
      <int> <int> <lgl> <int> <int> <lgl> <int> <chr>
    1     4    NA NA       NA     5 NA        4 A    
    2    NA     5 NA        4    NA NA        5 B    
    3    NA    NA NA        6    NA NA        6 D    
    4    NA    NA NA        7    NA NA        7 D
    

    The solution above assumes that the column names correspond to the positions established by which(). However, if it is not the case (e.g. c_across(-c(C, E, X)), then the results won't be correct. A solution for more complex situations could be:

    df %>%
     mutate(new = Reduce(coalesce, across(-c(C, E, X), ~ ifelse(. == X, cur_column(), NA_character_))))
                              
       A  B  C  D  E  F X new
    1  4 NA NA NA  5 NA 4   A
    2 NA  5 NA  4 NA NA 5   B
    3 NA NA NA  6 NA NA 6   D
    4 NA NA NA  7 NA NA 7   D