Is there a way to collapse rows based on multiple column conditions, grouped by ID in R?

I have a df of episodic antibiotic data (long form). For each patient, I’m looking to combine specific antibiotics into “combo” therapies based on 1) a specific list of antibiotics that would be of interest, and 2) ONLY if those specific antibiotics had the same start dates. The specific therapies of interest are:

Ampicillin and ceftriaxone
Ampicillin and gentamicin
Ampicillin and streptomycin

I don’t need any other combinations (e.g. if ampicillin and daptomycin start on the same day, I don’t want that combined).

I have a patient ID by which the antibiotic episodes can be grouped. My start_date variable has already been coerced into date class (YYYY-MM-DD). All other variables are character.

Here’s a sample set of data. In my actual dataframe, have over 1,000 rows.

ID <- c("C1", "C1", "C1", "C2", "C3", "C3", "C3", "C4", "C4", "C5", "C5", "C5", "C6", "C6")

abx <- c("ampicillin", "ceftriaxone", "vancomycin", "linezolid", "ampicillin", "gentamicin", "vancomycin", "piptazo", "daptomycin", "ampicillin", "streptomycin", "vancomycin", "ampicillin", "daptomycin") 

date <- c(as.Date("2020-02-01"), as.Date("2020-02-01"), as.Date("2020-01-28"), as.Date("2020-03-01"), as.Date("2020-02-17"), as.Date("2020-02-17"), as.Date("2020-03-01"), as.Date("2020-01-11"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-01"), as.Date("2020-01-14"), as.Date("2020-01-14"))

df_abx <- data.frame(ID, abx, date)

This is the solution I’m hoping for:

ID	abx	start_date
C1	ampicillin, ceftriaxone	2020-02-01
C1	vancomycin	2020-01-28
C2	linezolid	2020-03-01
C3	ampicillin, gentamicin	2020-02-17
C3	vancomycin	2020-03-01
C4	piptazo	2020-01-11
C4	daptomycin	2020-01-05
C5	ampicillin, streptomycin	2020-01-05
C5	vancomycin	2020-01-01
C6	ampicillin	2020-01-14
C6	daptomycin	2020-01-14

I suspect the solution includes "group_by" for the patient ID, and I've been toying around with summarize and case_when but I haven't gotten to where I want to be.

Solution

We may group by 'ID', 'date', create a new grouping column with case_match by making a common group for the selected drugs, and then add the grouping, to summarise the 'abx' by pasteing (str_c) the values in abx

library(dplyr)
library(stringr)
df_abx %>% 
 group_by(ID, date) %>%
  mutate(abx_grp = coalesce(case_match(abx,
   c('ampicillin', 'ceftriaxone', 'gentamicin', 'streptomycin') ~ 'g1'), abx)) %>%
  group_by(abx_grp, .add = TRUE) %>% 
 summarise(abx = str_c(abx, collapse = ", "), .groups = 'drop')  %>% 
 select(-abx_grp)

-output

# A tibble: 11 × 3
   ID    date       abx                     
   <chr> <date>     <chr>                   
 1 C1    2020-01-28 vancomycin              
 2 C1    2020-02-01 ampicillin, ceftriaxone 
 3 C2    2020-03-01 linezolid               
 4 C3    2020-02-17 ampicillin, gentamicin  
 5 C3    2020-03-01 vancomycin              
 6 C4    2020-01-05 daptomycin              
 7 C4    2020-01-11 piptazo                 
 8 C5    2020-01-01 vancomycin              
 9 C5    2020-01-05 ampicillin, streptomycin
10 C6    2020-01-14 daptomycin              
11 C6    2020-01-14 ampicillin