Search code examples
rgroup-by

Is there a way to collapse rows based on multiple column conditions, grouped by ID in R?


I have a df of episodic antibiotic data (long form). For each patient, I’m looking to combine specific antibiotics into “combo” therapies based on 1) a specific list of antibiotics that would be of interest, and 2) ONLY if those specific antibiotics had the same start dates. The specific therapies of interest are:

  • Ampicillin and ceftriaxone
  • Ampicillin and gentamicin
  • Ampicillin and streptomycin

I don’t need any other combinations (e.g. if ampicillin and daptomycin start on the same day, I don’t want that combined).

I have a patient ID by which the antibiotic episodes can be grouped. My start_date variable has already been coerced into date class (YYYY-MM-DD). All other variables are character.

Here’s a sample set of data. In my actual dataframe, have over 1,000 rows.

ID <- c("C1", "C1", "C1", "C2", "C3", "C3", "C3", "C4", "C4", "C5", "C5", "C5", "C6", "C6")

abx <- c("ampicillin", "ceftriaxone", "vancomycin", "linezolid", "ampicillin", "gentamicin", "vancomycin", "piptazo", "daptomycin", "ampicillin", "streptomycin", "vancomycin", "ampicillin", "daptomycin") 

date <- c(as.Date("2020-02-01"), as.Date("2020-02-01"), as.Date("2020-01-28"), as.Date("2020-03-01"), as.Date("2020-02-17"), as.Date("2020-02-17"), as.Date("2020-03-01"), as.Date("2020-01-11"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-01"), as.Date("2020-01-14"), as.Date("2020-01-14"))

df_abx <- data.frame(ID, abx, date)

This is the solution I’m hoping for:

ID abx start_date
C1 ampicillin, ceftriaxone 2020-02-01
C1 vancomycin 2020-01-28
C2 linezolid 2020-03-01
C3 ampicillin, gentamicin 2020-02-17
C3 vancomycin 2020-03-01
C4 piptazo 2020-01-11
C4 daptomycin 2020-01-05
C5 ampicillin, streptomycin 2020-01-05
C5 vancomycin 2020-01-01
C6 ampicillin 2020-01-14
C6 daptomycin 2020-01-14

I suspect the solution includes "group_by" for the patient ID, and I've been toying around with summarize and case_when but I haven't gotten to where I want to be.


Solution

  • We may group by 'ID', 'date', create a new grouping column with case_match by making a common group for the selected drugs, and then add the grouping, to summarise the 'abx' by pasteing (str_c) the values in abx

    library(dplyr)
    library(stringr)
    df_abx %>% 
     group_by(ID, date) %>%
      mutate(abx_grp = coalesce(case_match(abx,
       c('ampicillin', 'ceftriaxone', 'gentamicin', 'streptomycin') ~ 'g1'), abx)) %>%
      group_by(abx_grp, .add = TRUE) %>% 
     summarise(abx = str_c(abx, collapse = ", "), .groups = 'drop')  %>% 
     select(-abx_grp)
    

    -output

    # A tibble: 11 × 3
       ID    date       abx                     
       <chr> <date>     <chr>                   
     1 C1    2020-01-28 vancomycin              
     2 C1    2020-02-01 ampicillin, ceftriaxone 
     3 C2    2020-03-01 linezolid               
     4 C3    2020-02-17 ampicillin, gentamicin  
     5 C3    2020-03-01 vancomycin              
     6 C4    2020-01-05 daptomycin              
     7 C4    2020-01-11 piptazo                 
     8 C5    2020-01-01 vancomycin              
     9 C5    2020-01-05 ampicillin, streptomycin
    10 C6    2020-01-14 daptomycin              
    11 C6    2020-01-14 ampicillin