Search code examples
rgroup-byplyrdplyr

group rows (ID) and then assign a treatment for each group ID


I am working with a long (person-period) dataset with unbalanced number (varying N of observations per person). What I want to do is randomly assign a treatment (A, B or C) to every row/observation for each person (within a new column) with the treatment varying randomly by person. So each person will get one of three interventions and the intervention stays the same for each of their observations.

So starting from just an ID column I want to randomly assign the treatment. The final result would look something like this.

ID <- c(1,1,1,2,2,2,2,2,3,3,4,4,4,4,5,6,6,6,7,7)
Treatment <- c('a','a','a','b','b','b','b','b','c','c','a','a','a','a','b','c','c','c','a','a')

data <- data.frame(ID, Treatment)

data

I tried the example using ddply (How to generate a random treatment variable by factor?) however I want by treatment variable to be constant for the grouping variable.

Appreciate any help you can offer :)


Solution

  • You can use base R to do this with a merge:

    set.seed(1)
    
    random_trt <- data.frame(ID = unique(ID),
                             New_Treatment = sample(c("a", "b", "c"), size = length(unique(ID)), replace = T))
    
    merge(data, 
          random_trt, 
          by = "ID",
          all.x = T)
    
       ID Treatment New_Treatment
    1   1         a             a
    2   1         a             a
    3   1         a             a
    4   2         b             c
    5   2         b             c
    6   2         b             c
    7   2         b             c
    8   2         b             c
    9   3         c             a
    10  3         c             a
    11  4         a             b
    12  4         a             b
    13  4         a             b
    14  4         a             b
    15  5         b             a
    16  6         c             c
    17  6         c             c
    18  6         c             c
    19  7         a             c
    20  7         a             c
    

    You use sample to randomly sample your treatment vector for each unique ID. Then you merge that as a one-to-many merge so that it repeats for each ID in data.


    Using dplyr:

    set.seed(1)
    data %>%
      dplyr::group_by(ID) %>% 
      dplyr::mutate(New_Treatment = sample(c("a", "b", "c"), size = 1))
    
          ID Treatment New_Treatment
       <dbl> <chr>     <chr>        
     1     1 a         a            
     2     1 a         a            
     3     1 a         a            
     4     2 b         c            
     5     2 b         c            
     6     2 b         c            
     7     2 b         c            
     8     2 b         c            
     9     3 c         a            
    10     3 c         a            
    11     4 a         b            
    12     4 a         b            
    13     4 a         b            
    14     4 a         b            
    15     5 b         a            
    16     6 c         c            
    17     6 c         c            
    18     6 c         c            
    19     7 a         c            
    20     7 a         c