Search code examples
rdplyr

How can I paste the column name into a new column depending on another column's value in R


I'm trying to paste the name of a column ("diagnosis") into a new column ("suspected diagnosis" or susp_dx) if it has a 1 in it:

practice <- data.frame(record_id = 1:5,
                  susp_inflamm = c(1, 0, 0, 1, 0),
                  susp_bacteria = c(1, 0, 0, 0, 0),
                  susp_virus = c(0, 1, 0, 0, 0),
                  susp_fungus = c(0, 0, 1, 0, 0))

Some rows will have more than one diagnosis. practice$susp_dx would therefore contain susp_inflamm and susp_bacteria for record 1 for example.

practice$susp_dx <- names(practice[,c(2:5)])[which(practice[,c(2:5)] == 1)]

I've tried the above based on previous answers on here but it seems to apply column-wise or cycle through.

Ideally I'd prefer to use dplyr rather than base r to do it. I've tried the line below but starts_with needs a selecting function. I've tried using across to sum a row and then later mutate but that doesn't really do what I want.

susp_practice <- practice %>% 
  mutate(susp_dx = ifelse(where(is.double) & (starts_with("susp")) == 1), names(.), 0))

Any suggestions please?


Solution

  • dplyr by itself is not well-suited for this without reshaping the data back and forth, which makes the presence of other columns something to deal with, and larger data will be a bother.

    However, this can work:

    library(dplyr)
    library(tidyr) # pivot_longer
    practice |>
      pivot_longer(-record_id) |>
      filter(value > 0) |>
      summarize(.by = record_id, quux = paste(name, collapse = ", ")) |>
      full_join(practice, by = "record_id")
    # # A tibble: 5 × 6
    #   record_id quux                        susp_inflamm susp_bacteria susp_virus susp_fungus
    #       <int> <chr>                              <dbl>         <dbl>      <dbl>       <dbl>
    # 1         1 susp_inflamm, susp_bacteria            1             1          0           0
    # 2         2 susp_virus                             0             0          1           0
    # 3         3 susp_fungus                            0             0          0           1
    # 4         4 susp_inflamm                           1             0          0           0
    # 5         5 NA                                     0             0          0           0
    

    We can make a bespoke function to do this within dplyr:

    fun <- function(x) apply(x > 0, 1, function(z) paste(names(x)[z], collapse = ", "))
    practice |>
      mutate(quux = fun(pick(starts_with("susp_"))))
    #   record_id susp_inflamm susp_bacteria susp_virus susp_fungus                        quux
    # 1         1            1             1          0           0 susp_inflamm, susp_bacteria
    # 2         2            0             0          1           0                  susp_virus
    # 3         3            0             0          0           1                 susp_fungus
    # 4         4            1             0          0           0                susp_inflamm
    # 5         5            0             0          0           0                            
    

    In data.table-speak, this is fairly simple using the fun defined above.

    library(data.table)
    practice[, quux := fun(.SD), .SDcols = patterns("^susp_")]
    #    record_id susp_inflamm susp_bacteria susp_virus susp_fungus                        quux
    #        <int>        <num>         <num>      <num>       <num>                      <char>
    # 1:         1            1             1          0           0 susp_inflamm, susp_bacteria
    # 2:         2            0             0          1           0                  susp_virus
    # 3:         3            0             0          0           1                 susp_fungus
    # 4:         4            1             0          0           0                susp_inflamm
    # 5:         5            0             0          0           0