Search code examples
rdataframerandomdplyrsample

nested sampling of a data.frame in R


In data.frame p below, there are 757 unique district names (dname) & 5210 unique school names (sname).

I was wondering how to sample 126 snames (schools) from 40 dnames (districts) in R?

So, in the final sample (say X), dim(table(X$dname, X$sname)) must return: > [1] 40 126

In a sense, this is multi-stage sampling, so I'm open to any packages.

p <- read.csv("https://raw.githubusercontent.com/hkil/m/master/a.csv")

Solution

  • I guess you can try the code below for this sort of multi-stage sampling

    unq_dname <- unique(p$dname)
    repeat {
      out <- subset(p, dname %in% sample(unq_dname, 40))
      if (length(unique(out$sname)) == 126) break
    }
    

    and you can check the dimensions via

    dim(with(out,table(dname,sname)))