Search code examples
rmutate

set column value to first value meeting conditions across columns


I have a dataframe with columns that start with "dx". I want to create another column called primary. If the value in all of the columns that start with "dx" is "I629" or NA, I want the value of primary to be "Unspecified". Otherwise, I want the it to be the first non-"I629" value.

My desired output:

dx1  dx2  dx3 dx4 dx5 primary
I629 NA   NA  NA  NA  Unspecified
S065 NA   NA  NA  NA  S065
I629 S066 NA  NA  NA  S066
I629 I629 NA  NA  NA  Unspecified

Solution

  • A tidyverse solution: create a helper dataframe where "I629" is replaced with NA across all Dx columns; use dplyr::coalesce() to take the first non-NA value (or "Unspecified" if all NA); and finally bind the new primary column to your original dataframe.

    library(dplyr)
    library(tidyr)
    
    primary_dx <- dat %>% 
      mutate(
        across(starts_with("dx"), \(col) na_if(col, "I629")),
        primary = coalesce(!!!select(., starts_with("dx")), "Unspecified")
      ) %>% 
      select(primary)
    
    bind_cols(dat, primary_dx)
    
    # A tibble: 4 × 6
      dx1   dx2   dx3   dx4   dx5   primary    
      <chr> <chr> <lgl> <lgl> <lgl> <chr>      
    1 I629  NA    NA    NA    NA    Unspecified
    2 S065  NA    NA    NA    NA    S065       
    3 I629  S066  NA    NA    NA    S066       
    4 I629  I629  NA    NA    NA    Unspecified