I have a df of episodic antibiotic data (long form). For each patient, I’m looking to combine specific antibiotics into “combo” therapies based on 1) a specific list of antibiotics that would be of interest, and 2) ONLY if those specific antibiotics had the same start dates. The specific therapies of interest are:
I don’t need any other combinations (e.g. if ampicillin and daptomycin start on the same day, I don’t want that combined).
I have a patient ID by which the antibiotic episodes can be grouped. My start_date variable has already been coerced into date class (YYYY-MM-DD). All other variables are character.
Here’s a sample set of data. In my actual dataframe, have over 1,000 rows.
ID <- c("C1", "C1", "C1", "C2", "C3", "C3", "C3", "C4", "C4", "C5", "C5", "C5", "C6", "C6")
abx <- c("ampicillin", "ceftriaxone", "vancomycin", "linezolid", "ampicillin", "gentamicin", "vancomycin", "piptazo", "daptomycin", "ampicillin", "streptomycin", "vancomycin", "ampicillin", "daptomycin")
date <- c(as.Date("2020-02-01"), as.Date("2020-02-01"), as.Date("2020-01-28"), as.Date("2020-03-01"), as.Date("2020-02-17"), as.Date("2020-02-17"), as.Date("2020-03-01"), as.Date("2020-01-11"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-01"), as.Date("2020-01-14"), as.Date("2020-01-14"))
df_abx <- data.frame(ID, abx, date)
This is the solution I’m hoping for:
ID | abx | start_date |
---|---|---|
C1 | ampicillin, ceftriaxone | 2020-02-01 |
C1 | vancomycin | 2020-01-28 |
C2 | linezolid | 2020-03-01 |
C3 | ampicillin, gentamicin | 2020-02-17 |
C3 | vancomycin | 2020-03-01 |
C4 | piptazo | 2020-01-11 |
C4 | daptomycin | 2020-01-05 |
C5 | ampicillin, streptomycin | 2020-01-05 |
C5 | vancomycin | 2020-01-01 |
C6 | ampicillin | 2020-01-14 |
C6 | daptomycin | 2020-01-14 |
I suspect the solution includes "group_by" for the patient ID, and I've been toying around with summarize and case_when but I haven't gotten to where I want to be.
We may group by 'ID', 'date', create a new grouping column with case_match
by making a common group for the selected drugs, and then add the grouping, to summarise the 'abx' by paste
ing (str_c
) the values in abx
library(dplyr)
library(stringr)
df_abx %>%
group_by(ID, date) %>%
mutate(abx_grp = coalesce(case_match(abx,
c('ampicillin', 'ceftriaxone', 'gentamicin', 'streptomycin') ~ 'g1'), abx)) %>%
group_by(abx_grp, .add = TRUE) %>%
summarise(abx = str_c(abx, collapse = ", "), .groups = 'drop') %>%
select(-abx_grp)
-output
# A tibble: 11 × 3
ID date abx
<chr> <date> <chr>
1 C1 2020-01-28 vancomycin
2 C1 2020-02-01 ampicillin, ceftriaxone
3 C2 2020-03-01 linezolid
4 C3 2020-02-17 ampicillin, gentamicin
5 C3 2020-03-01 vancomycin
6 C4 2020-01-05 daptomycin
7 C4 2020-01-11 piptazo
8 C5 2020-01-01 vancomycin
9 C5 2020-01-05 ampicillin, streptomycin
10 C6 2020-01-14 daptomycin
11 C6 2020-01-14 ampicillin