Search code examples
rrandomsequencesample

Random sampling in R without direct repetition and exact quantity of each number


How can I randomly sample the color order of 368 images using 4 colors that

  • should not be repeated directly ("red" "red" "blue" would not be ok, but "red" "blue" "red" would be)
  • should each appear with an equal quantity (each 92 times because 368/4 = 92)?

Based on this, I have already managed the sampling without direct repetition:

library("dplyr")
set.seed(340)
values <- c("blue", "red", "green", "yellow")
len <- 368 # number of samples
samp <- sample(values, 1) # initialise variable
cols <- sapply(2:len, function(i) samp[i] <<- sample(setdiff(values, samp[i-1]), 1, replace = TRUE))
table(cols) # colors appear 94, 92, 88, 93 times

I tried building a for-loop that samples until the exact numbers are reached with if(table(cols)[1:4] == 92), but it didn't work and after doing a lot of research, I still don't know how to proceed. I would be really thankful for tips and help!


Solution

  • You can try this. The idea is to create the sorted vector first (seqc). Then for each iteration, the algorithm sample one value out of the possible values (i.e. all except the previous one in the vector).

    seqc <- rep(values, each = 92)
    vec <- sample(seqc, 1)
    seqc <- seqc[-match(vec, seqc)]
    for (i in 2:368){
      vec[i] <- sample(seqc[seqc != vec[i - 1]], 1)
      seqc <- seqc[-match(vec[i], seqc)]
    }
    

    output

    head(vec)
    # [1] "red"    "blue"   "red"    "yellow" "blue"   "yellow"
    
    table(vec)
    #vec
    #  blue  green    red yellow 
    #    92     92     92     92
    

    It might throw an error, because the algorithm might not work as expected. In that case, rerun it until it works; it usually takes no more than 3 iterations for it to work.