Search code examples
rsampling

R a factor sampling methode


I would like to produce a direct sampling with the following object :

v <- c("piment","aubergine","carotte","oignon","chou","pommeDeTerre","Na")
n <- 12

## TEST 1 : crach R
tmp <- data.frame(matrix(rep(v,n), ncol = n))
expand.grid(tmp)

But this will produce a matrix of too large size (2.176782336e+9 possibilities). So I have to study a sampling, but I don't know how to make it representative of my population.


Solution

  • If your study sample consists of v, resampling length(v) values (which is 7) from v gives you a valid bootstrap sample. Repeat this resampling B times and you have (non-parametrically) bootstrapped your study sample in an appropriate way.

    Make sure to always fix the random number generator, so that your resampling procedure can be reproduced at any instance.

    set.seed(1) # fix random number generator
    B <- 1e4L
    
    boot_fun <- function(data, B, n_sample) {
      boot <- replicate(
        B,
        data[sample(seq_along(data), n_sample, replace = TRUE)],
        simplify = FALSE
      )
      return(boot)
    }
    out <- boot_fun(data = v, B = B, n_sample = 12L)
    
    #> out[[1]]
    #[1] "chou"         "pommeDeTerre" "chou"         "chou"         "chou"         "pommeDeTerre" "carotte"      "chou"        
    #[9] "piment"       "carotte"      "piment"       "oignon"