Search code examples
rrandomsamplerep

R : Assigning students to equal groups with random sampling. Understanding rep() argument length.out to sample()


I have 33 students I want to sort into groups of 6 (or as close as possible) on 5 different occasions. So I assign a number between 1 and 6 to the students on different occassions.

I've managed the following:

studentlist <- data.frame(seq(1:33))

studentlist$Occassion1 <- sample(factor(rep(1:6, length.out=nrow(studentlist)), 
                                 labels=paste0(1:6)))
studentlist$Occassion2 <- sample(factor(rep(1:6, length.out=nrow(studentlist)), 
                                 labels=paste0(1:6)))
studentlist$Occassion3 <- sample(factor(rep(1:6, length.out=nrow(studentlist)), 
                                 labels=paste0(1:6)))
studentlist$Occassion4 <- sample(factor(rep(1:6, length.out=nrow(studentlist)), 
                                 labels=paste0(1:6)))
studentlist$Occassion5 <- sample(factor(rep(1:6, length.out=nrow(studentlist)), 
                                 labels=paste0(1:6)))

This seems to work. As I've understood, I ask for each row a random sample between 1 and 6.

How does the length.out argument from rep() interact with sample()?

When I run a frequency table to check the sizes of the groups, I find the following:

numb=1,2,3,4,5,6. size=6,6,6,5,5,5.

I tried asking for 7 groups instead, and got group sizes of:

numb=1,2,3,4,5,6,7. size=5,5,5,5,5,4,4.

Why are they organised in this decreasing fashion?


Solution

  • You have this specific pattern because of how the rep function works with length.out. If you want to create groups of 6,

    rep(1:6, length.out = 33) 
    

    will first repeat the numbers 1 to 6 5 times (generating 30 values) and complete the 3 missing ones with values 1, 2 and 3. So you will always have one more student in the groups 1, 2 and 3.