Search code examples
rrandomsampling

Randomly sample table cells - equal N across rows and columns


I plan to have 12 people answer 300 questions. Each subject will answer 100 questions, and each question answered by 4 subjects.

For various reasons, the assignment must be random. Here is how I approach this but am open to any ideas.

I created a blank 300*12 data frame (300 rows named by question ids and 12 columns for subjects). For each subject column, randomly sample 100 rows and put "1" in the 100 cells. As a result, I can make sure each subject is assigned to 100 questions randomly, but not all questions get answered by exactly 4 subjects.


Solution

  • Because this is a problem that comes up in community ecology (generating "null communities" with observed marginals), you can do it with the permatswap() function in the vegan package.

    Generate a binary matrix (non-random) matrix with desired marginals:

    basemat <- matrix(0,nrow=300,ncol=12)
    nq <- 100  ## number of questions
    qs <- ncol(basemat)*nq/nrow(basemat) ## q per subject
    for (i in 1:ncol(basemat)) {
        basemat[1:100+(nq*((i-1) %/% qs)),i]  <- 1
    }
    ## check margins
    all(rowSums(basemat)==qs)
    all(colSums(basemat)==nq)
    

    Now swap:

    pp <- permatswap(basemat,times=1)
    pp$perm[[1]]  ## extract matrix
    

    This generates one random binary matrix with the specified margins. This is a fairly difficult computational problem: depending on how important the properties of the randomization are to you, you should at least use image() on the result to check visually that it looks scrambled, and strongly consider digging through the ?permatswap and ?make.commsim help pages from vegan to get an understanding of some of the technical issues ...

    You might also be able to find a solution by searching the literature on Latin square designs. (In R: library("sos"); findFn("latin square"))