I have 33 students I want to sort into groups of 6 (or as close as possible) on 5 different occasions. So I assign a number between 1 and 6 to the students on different occassions.
I've managed the following:
studentlist <- data.frame(seq(1:33))
studentlist$Occassion1 <- sample(factor(rep(1:6, length.out=nrow(studentlist)),
labels=paste0(1:6)))
studentlist$Occassion2 <- sample(factor(rep(1:6, length.out=nrow(studentlist)),
labels=paste0(1:6)))
studentlist$Occassion3 <- sample(factor(rep(1:6, length.out=nrow(studentlist)),
labels=paste0(1:6)))
studentlist$Occassion4 <- sample(factor(rep(1:6, length.out=nrow(studentlist)),
labels=paste0(1:6)))
studentlist$Occassion5 <- sample(factor(rep(1:6, length.out=nrow(studentlist)),
labels=paste0(1:6)))
This seems to work. As I've understood, I ask for each row a random sample between 1 and 6.
How does the length.out argument from rep() interact with sample()?
When I run a frequency table to check the sizes of the groups, I find the following:
numb=1,2,3,4,5,6. size=6,6,6,5,5,5.
I tried asking for 7 groups instead, and got group sizes of:
numb=1,2,3,4,5,6,7. size=5,5,5,5,5,4,4.
Why are they organised in this decreasing fashion?
You have this specific pattern because of how the rep function works with length.out. If you want to create groups of 6,
rep(1:6, length.out = 33)
will first repeat the numbers 1 to 6 5 times (generating 30 values) and complete the 3 missing ones with values 1, 2 and 3. So you will always have one more student in the groups 1, 2 and 3.