I'm trying to paste the name of a column ("diagnosis") into a new column ("suspected diagnosis" or susp_dx) if it has a 1 in it:
practice <- data.frame(record_id = 1:5,
susp_inflamm = c(1, 0, 0, 1, 0),
susp_bacteria = c(1, 0, 0, 0, 0),
susp_virus = c(0, 1, 0, 0, 0),
susp_fungus = c(0, 0, 1, 0, 0))
Some rows will have more than one diagnosis. practice$susp_dx would therefore contain susp_inflamm and susp_bacteria for record 1 for example.
practice$susp_dx <- names(practice[,c(2:5)])[which(practice[,c(2:5)] == 1)]
I've tried the above based on previous answers on here but it seems to apply column-wise or cycle through.
Ideally I'd prefer to use dplyr rather than base r to do it. I've tried the line below but starts_with needs a selecting function. I've tried using across to sum a row and then later mutate but that doesn't really do what I want.
susp_practice <- practice %>%
mutate(susp_dx = ifelse(where(is.double) & (starts_with("susp")) == 1), names(.), 0))
Any suggestions please?
dplyr
by itself is not well-suited for this without reshaping the data back and forth, which makes the presence of other columns something to deal with, and larger data will be a bother.
However, this can work:
library(dplyr)
library(tidyr) # pivot_longer
practice |>
pivot_longer(-record_id) |>
filter(value > 0) |>
summarize(.by = record_id, quux = paste(name, collapse = ", ")) |>
full_join(practice, by = "record_id")
# # A tibble: 5 × 6
# record_id quux susp_inflamm susp_bacteria susp_virus susp_fungus
# <int> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 susp_inflamm, susp_bacteria 1 1 0 0
# 2 2 susp_virus 0 0 1 0
# 3 3 susp_fungus 0 0 0 1
# 4 4 susp_inflamm 1 0 0 0
# 5 5 NA 0 0 0 0
We can make a bespoke function to do this within dplyr
:
fun <- function(x) apply(x > 0, 1, function(z) paste(names(x)[z], collapse = ", "))
practice |>
mutate(quux = fun(pick(starts_with("susp_"))))
# record_id susp_inflamm susp_bacteria susp_virus susp_fungus quux
# 1 1 1 1 0 0 susp_inflamm, susp_bacteria
# 2 2 0 0 1 0 susp_virus
# 3 3 0 0 0 1 susp_fungus
# 4 4 1 0 0 0 susp_inflamm
# 5 5 0 0 0 0
In data.table
-speak, this is fairly simple using the fun
defined above.
library(data.table)
practice[, quux := fun(.SD), .SDcols = patterns("^susp_")]
# record_id susp_inflamm susp_bacteria susp_virus susp_fungus quux
# <int> <num> <num> <num> <num> <char>
# 1: 1 1 1 0 0 susp_inflamm, susp_bacteria
# 2: 2 0 0 1 0 susp_virus
# 3: 3 0 0 0 1 susp_fungus
# 4: 4 1 0 0 0 susp_inflamm
# 5: 5 0 0 0 0