Search code examples
rdplyrcase-when

Add column based on condition in other column per group in dplyr with data in a long format, using dplyr


I have data of repeated measurements (4 or 5 times) of 3 patients in a long format:

library(dplyr)
library(magrittr)

questiondata <- structure(list(ID = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 
4), time = c("time1", "time2", "time3", "time4", "time1", "time2", 
"time3", "time4", "time5", "time1", "time2", "time3", "time4", 
"time5"), drug_use = structure(c(NA, 1L, NA, NA, NA, 2L, NA, 
NA, NA, NA, 1L, NA, NA, NA), .Label = c("no", "yes"), class = "factor")), row.names = c(NA, 
-14L), class = c("tbl_df", "tbl", "data.frame"))

# Corresponding to the following tibble:
# A tibble: 14 x 3
      ID time  drug_use
   <dbl> <chr> <fct>   
 1     2 time1 NA      
 2     2 time2 no      
 3     2 time3 NA      
 4     2 time4 NA      
 5     3 time1 NA      
 6     3 time2 yes     
 7     3 time3 NA      
 8     3 time4 NA      
 9     3 time5 NA      
10     4 time1 NA      
11     4 time2 no      
12     4 time3 NA      
13     4 time4 NA      
14     4 time5 NA       

I am trying to make a new column on drug use (named 'drug_use_2) that says 'yes' or 'no' per patient, based on if that patient used drugs at any time. So the desired result would look like this:

 A tibble: 14 x 4
      ID time  drug_use drug_use2
   <dbl> <chr> <fct>    <fct>    
 1     2 time1 NA       no       
 2     2 time2 no       no       
 3     2 time3 NA       no       
 4     2 time4 NA       no       
 5     3 time1 NA       yes       
 6     3 time2 yes      yes      
 7     3 time3 NA       yes      
 8     3 time4 NA       yes      
 9     3 time5 NA       yes      
10     4 time1 NA       no      
11     4 time2 no       no       
12     4 time3 NA       no       
13     4 time4 NA       no       
14     4 time5 NA       no  

I have tried grouping on PXE number and/or on drug_use and then using mutate with case when, but that gets stuck on the missing values for drug_use:

questiondata <- questiondata %>% 
  group_by(ID) %>% 
  mutate(drug_use2=
           case_when(
             drug_use=="yes" ~ "yes", 
             drug_use=="no" ~ "no", 
             TRUE ~ "missing"))

# A tibble: 14 x 4
# Groups:   ID [3]
      ID time  drug_use drug_use2
   <dbl> <chr> <fct>    <chr>    
 1     2 time1 NA       missing  
 2     2 time2 no       no       
 3     2 time3 NA       missing  
 4     2 time4 NA       missing  
 5     3 time1 NA       missing  
 6     3 time2 yes      yes      
 7     3 time3 NA       missing  
 8     3 time4 NA       missing  
 9     3 time5 NA       missing  
10     4 time1 NA       missing  
11     4 time2 no       no       
12     4 time3 NA       missing  
13     4 time4 NA       missing  
14     4 time5 NA       missing 

Am I using the group_by wrong? Thanks in advance


Solution

  • Use any

    questiondata %>%
      group_by(ID) %>%
      mutate(druguse2 = case_when(any(drug_use == 'yes') ~ 'yes',
                                  TRUE ~ 'no'))
    
    # A tibble: 14 x 4
    # Groups:   ID [3]
          ID time  drug_use druguse2
       <dbl> <chr> <fct>    <chr>   
     1     2 time1 NA       no      
     2     2 time2 no       no      
     3     2 time3 NA       no      
     4     2 time4 NA       no      
     5     3 time1 NA       yes     
     6     3 time2 yes      yes     
     7     3 time3 NA       yes     
     8     3 time4 NA       yes     
     9     3 time5 NA       yes     
    10     4 time1 NA       no      
    11     4 time2 no       no      
    12     4 time3 NA       no      
    13     4 time4 NA       no      
    14     4 time5 NA       no