Search code examples
rdataframefunctiondplyrtidyverse

Paste the names of a columns in which there is a "Y" in R


In my DATA below, I want to create a New_Column which pastes the names of columns A:F in which there is a Y to achieve my Desired_Output.

I was wondering if this might be possible in R?

DATA <- read.table(h=TRUE,t="
ID  A   B   C   D   E   F
1   Y   Y   N   N   N   N
2   N   N   N   Y   N   N
3   Y   N   Y   N   Y   N
4   Y   Y   N   N   N   N
5   N   N   N   Y   N   N
6   Y   Y   Y   N   N   N
7   N   N   N   N   N   N
8   NA  NA  NA  NA  NA  NA")

Desired_Output <- read.table(h=TRUE,t="
ID  A   B   C   D   E   F   New_Column
1   Y   Y   N   N   N   N   A_B
2   N   N   N   Y   N   N   D
3   Y   N   Y   N   Y   N   A_C_E
4   Y   Y   N   N   N   N   A_B
5   N   N   N   Y   N   N   D
6   Y   Y   Y   N   N   N   A_B_C
7   N   N   N   N   N   N   N
8   NA  NA  NA  NA  NA  NA  NA")

Solution

  • You can do this in a couple of steps with the tidyverse:

    library(tidyverse)
    
    new_col <- DATA |> 
      pivot_longer(-ID) |>
      filter(value == 'Y') |> 
      group_by(ID) |> 
      summarize(new_column = paste(name, collapse = '_'))
    
    new_df <- left_join(DATA, new_col, by = 'ID')
    
      ID    A    B    C    D    E    F new_column
    1  1    Y    Y    N    N    N    N        A_B
    2  2    N    N    N    Y    N    N          D
    3  3    Y    N    Y    N    Y    N      A_C_E
    4  4    Y    Y    N    N    N    N        A_B
    5  5    N    N    N    Y    N    N          D
    6  6    Y    Y    Y    N    N    N      A_B_C
    7  7    N    N    N    N    N    N       <NA>
    8  8 <NA> <NA> <NA> <NA> <NA> <NA>       <NA>