I have a dataframe with columns that start with "dx". I want to create another column called primary
. If the value in all of the columns that start with "dx" is "I629"
or NA
, I want the value of primary
to be "Unspecified"
. Otherwise, I want the it to be the first non-"I629"
value.
My desired output:
dx1 dx2 dx3 dx4 dx5 primary
I629 NA NA NA NA Unspecified
S065 NA NA NA NA S065
I629 S066 NA NA NA S066
I629 I629 NA NA NA Unspecified
A tidyverse solution: create a helper dataframe where "I629"
is replaced with NA
across all Dx
columns; use dplyr::coalesce()
to take the first non-NA
value (or "Unspecified"
if all NA
); and finally bind the new primary
column to your original dataframe.
library(dplyr)
library(tidyr)
primary_dx <- dat %>%
mutate(
across(starts_with("dx"), \(col) na_if(col, "I629")),
primary = coalesce(!!!select(., starts_with("dx")), "Unspecified")
) %>%
select(primary)
bind_cols(dat, primary_dx)
# A tibble: 4 × 6
dx1 dx2 dx3 dx4 dx5 primary
<chr> <chr> <lgl> <lgl> <lgl> <chr>
1 I629 NA NA NA NA Unspecified
2 S065 NA NA NA NA S065
3 I629 S066 NA NA NA S066
4 I629 I629 NA NA NA Unspecified