The Problem:
I am attempting to use R to generate a random study design where half of the participants are randomly assigned to "Treatement 1" and the other half are assigned to "Treatment 2". However, because half of the subjects are male and half are female and I also want to ensure that an equal number of males and females are exposed to each treatment, half of the males and females should be assigned to "Treatment 1" and the remaining half should be assigned to "Treatment 2".
There are two complications to this design: (1) This is a yearlong study and the assignment of participants to treatment must occur on a daily basis; and (2) Each participant must be exposed to "Treatment 1" a minimum 10 times in a 28 day period.
Is this even possible to automate this in the R interface? I assume so, but I think my beginner status as an R programmer prohibits me from finding the solution on my own. I have been struggling for days to figure out how to actualize this, and have looked through many similar-sounding posts on this site that were not able to be successfully applied here. I am hoping someone out there knows some tricks that could help me get unstuck in solving this problem, any advice would be greatly appreciated!
What I Have Tried:
Specific Information
# There are 16 participants
p <- c("P01", "P02", "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10", "P11", "P12", "P13", "P14", "P15", "P16")
# Half are male and half are female
g <- c(rep("M", 8), rep("F", 8))
# I make a dataframe but this may not be necessary
df <-,g)
# There are 365 days in one year
d <- seq(1,365,1)
...unfortunately, I am not sure how to proceed from here.
Ideal Outcome:
I am envisioning something approximate to this table as the outcome:
Basically there is a column for each participant and a row for each day. Associated with each day is an assignment to either Treatment 1 (T1) or Treatment 2 (T2), with 4 of the 8 males and 4 of the 8 females being assigned to T1 and the remainder to T2. These treatments are reassigned every day for 1 year. Not depicted in this chart is the need for each participant to be exposed to T1 at least 10 times in a 28-day period. The table does not have to look like that if something else makes more sense!
Consider splitting data frame by day and gender with by
, then run enough samples with replicate
at 100 times to pick one of several where treatments are balanced:
df <- merge(data.frame(participant = p, gender = g),
data.frame(days = seq(1,365)),
df_list <- by(df, list(df$gender, df$days), function(sub){
t <- replicate(100, { # RUN 100 REPETITIONS OF EXPRESSION
s <- sample(c("T1", "T2"), size=nrow(sub), replace=TRUE) # SAMPLE "T1" AND "T2" BY SIZE OF SUBSET
s[ sum(s == "T1") == sum(s == "T2") ] # FILTER TO EQUAL TREATMENTS
transform(sub, treatment = t) # ASSIGN RESULT TO NEW COLUMN
final_df <- data.frame(, df_list), row.names=NULL)
Day 1
head(final_df, 16)
# participant gender days treatment
# 1 P09 F 1 T1
# 2 P10 F 1 T2
# 3 P11 F 1 T2
# 4 P12 F 1 T1
# 5 P13 F 1 T2
# 6 P14 F 1 T2
# 7 P15 F 1 T1
# 8 P16 F 1 T1
# 9 P01 M 1 T1
# 10 P02 M 1 T1
# 11 P03 M 1 T2
# 12 P04 M 1 T2
# 13 P05 M 1 T2
# 14 P06 M 1 T1
# 15 P07 M 1 T1
# 16 P08 M 1 T2
Day 365
tail(final_df, 16)
# participant gender days treatment
# 5825 P09 F 365 T2
# 5826 P10 F 365 T2
# 5827 P11 F 365 T1
# 5828 P12 F 365 T2
# 5829 P13 F 365 T1
# 5830 P14 F 365 T2
# 5831 P15 F 365 T1
# 5832 P16 F 365 T1
# 5833 P01 M 365 T1
# 5834 P02 M 365 T2
# 5835 P03 M 365 T1
# 5836 P04 M 365 T2
# 5837 P05 M 365 T2
# 5838 P06 M 365 T2
# 5839 P07 M 365 T1
# 5840 P08 M 365 T1
Ideally, for analytical purposes you should keep data in long format (i.e., tidy data). But if needing wide format consider reshape
with helper and cleanup processing:
final_df$participant_gender <- with(final_df, paste0(participant, gender))
new_names <- paste0(p, g)
wide_df <- reshape(final_df, v.names = "treatment", timevar = "participant_gender",
idvar="days", drop = c("gender", "participant"),
new.row.names = 1:365, direction = "wide")
names(wide_df) <- gsub("treatment.", "", names(wide_df))
wide_df <- wide_df[c("days", new_names)]
# days P01M P02M P03M P04M P05M P06M P07M P08M P09F P10F P11F P12F P13F P14F P15F P16F
# 1 1 T1 T1 T2 T2 T2 T1 T1 T2 T1 T2 T2 T1 T2 T2 T1 T1
# 2 2 T1 T1 T2 T1 T2 T1 T2 T2 T1 T2 T2 T1 T2 T2 T1 T1
# 3 3 T1 T1 T2 T1 T1 T2 T2 T2 T1 T2 T2 T2 T1 T2 T1 T1
# 4 4 T1 T1 T1 T2 T2 T2 T1 T2 T2 T1 T1 T2 T2 T1 T1 T2
# 5 5 T1 T1 T2 T1 T2 T2 T1 T2 T1 T1 T2 T1 T2 T2 T1 T2
# 6 6 T2 T1 T1 T1 T2 T2 T1 T2 T2 T2 T2 T1 T2 T1 T1 T1