Search code examples
rrandomsampling

Use R to Randomly Assign of Participants to Treatments on a Daily Basis


The Problem:

I am attempting to use R to generate a random study design where half of the participants are randomly assigned to "Treatement 1" and the other half are assigned to "Treatment 2". However, because half of the subjects are male and half are female and I also want to ensure that an equal number of males and females are exposed to each treatment, half of the males and females should be assigned to "Treatment 1" and the remaining half should be assigned to "Treatment 2".

There are two complications to this design: (1) This is a yearlong study and the assignment of participants to treatment must occur on a daily basis; and (2) Each participant must be exposed to "Treatment 1" a minimum 10 times in a 28 day period.

Is this even possible to automate this in the R interface? I assume so, but I think my beginner status as an R programmer prohibits me from finding the solution on my own. I have been struggling for days to figure out how to actualize this, and have looked through many similar-sounding posts on this site that were not able to be successfully applied here. I am hoping someone out there knows some tricks that could help me get unstuck in solving this problem, any advice would be greatly appreciated!

What I Have Tried:

Specific Information

# There are 16 participants
p <- c("P01", "P02", "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10", "P11", "P12", "P13", "P14", "P15", "P16")

# Half are male and half are female
g <- c(rep("M", 8), rep("F", 8))

# I make a dataframe but this may not be necessary
df <- cbind.data.frame(p,g)

# There are 365 days in one year
d <- seq(1,365,1)

...unfortunately, I am not sure how to proceed from here.

Ideal Outcome:

I am envisioning something approximate to this table as the outcome: I do not have enough reputation points to embed images yet so here is the link, sorry!

Basically there is a column for each participant and a row for each day. Associated with each day is an assignment to either Treatment 1 (T1) or Treatment 2 (T2), with 4 of the 8 males and 4 of the 8 females being assigned to T1 and the remainder to T2. These treatments are reassigned every day for 1 year. Not depicted in this chart is the need for each participant to be exposed to T1 at least 10 times in a 28-day period. The table does not have to look like that if something else makes more sense!


Solution

  • Consider splitting data frame by day and gender with by, then run enough samples with replicate at 100 times to pick one of several where treatments are balanced:

    Data

    df <- merge(data.frame(participant = p, gender = g), 
                data.frame(days = seq(1,365)), 
                by=NULL)
    

    Solution

    df_list <- by(df, list(df$gender, df$days), function(sub){
      t <- replicate(100, {                                        # RUN 100 REPETITIONS OF EXPRESSION
        s <- sample(c("T1", "T2"), size=nrow(sub), replace=TRUE)   # SAMPLE "T1" AND "T2" BY SIZE OF SUBSET
        s[ sum(s == "T1") == sum(s == "T2") ]                      # FILTER TO EQUAL TREATMENTS 
      })
    
      t <- Filter(length, t)[[1]]             # SELECT FIRST OF SEVERAL NON-EMPTY RETURNS
      transform(sub, treatment = t)           # ASSIGN RESULT TO NEW COLUMN
    })
    
    # BIND DATA FRAMES AND RESET ROW.NAMES
    final_df <- data.frame(do.call(rbind.data.frame, df_list), row.names=NULL)
    

    Output

    Day 1

    head(final_df, 16)
    
    #    participant gender days treatment
    # 1          P09      F    1        T1
    # 2          P10      F    1        T2
    # 3          P11      F    1        T2
    # 4          P12      F    1        T1
    # 5          P13      F    1        T2
    # 6          P14      F    1        T2
    # 7          P15      F    1        T1
    # 8          P16      F    1        T1
    # 9          P01      M    1        T1
    # 10         P02      M    1        T1
    # 11         P03      M    1        T2
    # 12         P04      M    1        T2
    # 13         P05      M    1        T2
    # 14         P06      M    1        T1
    # 15         P07      M    1        T1
    # 16         P08      M    1        T2
    

    Day 365

    tail(final_df, 16)
    
    #      participant gender days treatment
    # 5825         P09      F  365        T2
    # 5826         P10      F  365        T2
    # 5827         P11      F  365        T1
    # 5828         P12      F  365        T2
    # 5829         P13      F  365        T1
    # 5830         P14      F  365        T2
    # 5831         P15      F  365        T1
    # 5832         P16      F  365        T1
    # 5833         P01      M  365        T1
    # 5834         P02      M  365        T2
    # 5835         P03      M  365        T1
    # 5836         P04      M  365        T2
    # 5837         P05      M  365        T2
    # 5838         P06      M  365        T2
    # 5839         P07      M  365        T1
    # 5840         P08      M  365        T1
    

    Ideally, for analytical purposes you should keep data in long format (i.e., tidy data). But if needing wide format consider reshape with helper and cleanup processing:

    # HELPER OBJECTS
    final_df$participant_gender <- with(final_df, paste0(participant, gender))
    new_names <- paste0(p, g)
    
    # RESHAPE WIDE
    wide_df <- reshape(final_df, v.names = "treatment", timevar = "participant_gender", 
                       idvar="days", drop = c("gender", "participant"), 
                       new.row.names = 1:365, direction = "wide")
    
    # RENAME AND RE-ORDER COLUMNS
    names(wide_df) <- gsub("treatment.", "", names(wide_df))
    wide_df <- wide_df[c("days", new_names)]
    
    head(wide_df)
    #   days P01M P02M P03M P04M P05M P06M P07M P08M P09F P10F P11F P12F P13F P14F P15F P16F
    # 1    1   T1   T1   T2   T2   T2   T1   T1   T2   T1   T2   T2   T1   T2   T2   T1   T1
    # 2    2   T1   T1   T2   T1   T2   T1   T2   T2   T1   T2   T2   T1   T2   T2   T1   T1
    # 3    3   T1   T1   T2   T1   T1   T2   T2   T2   T1   T2   T2   T2   T1   T2   T1   T1
    # 4    4   T1   T1   T1   T2   T2   T2   T1   T2   T2   T1   T1   T2   T2   T1   T1   T2
    # 5    5   T1   T1   T2   T1   T2   T2   T1   T2   T1   T1   T2   T1   T2   T2   T1   T2
    # 6    6   T2   T1   T1   T1   T2   T2   T1   T2   T2   T2   T2   T1   T2   T1   T1   T1