Search code examples
rsampling

Using a sample list as a template for sampling from a larger list with wraparound


Similar to my question at Using a sample list as a template for sampling from a larger list without wraparound, how can I know do this allowing for a wrap-around?

Thus, if I have a vector of letters:

> all <- letters
> all
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

and then I define a reference sample from letters as follows:

> refSample <- c("j","l","m","s")

in which the spacing between elements is 2 (1st to 2nd), 1 (2nd to 3rd) and 6 (3rd to 4th), how can I then select n samples from all that have identical, wrap-around spacing between its elements to refSample? For example, "a","c","d","j", "q" "s" "t" "z" and "r" "t" "u" "a" would be valid samples, but "a","c","d","k" would not.

Again, parameterised for a function is best.


Solution

  • I would have left it as an exercise but here goes --

    all <- letters                                                                                                                                                                                                                                                                
    refSample <- c("j","l","m","s")    
    
    pick_matches <- function(n, ref, full, wrap = FALSE) {                                                                                                                                                                                                                        
      iref <- match(ref,full)                                                                                                                                                                                                                                                     
      spaces <- diff(iref)                                                                                                                                                                                                                                                        
      tot_space <- sum(spaces)                                                                                                                                                                                                                                                    
      N <- length( full ) - 1                                                                                                                                                                                                                                                     
      max_start <- N  - tot_space*(1-wrap)                                                                                                                                                                                                                                        
      starts <- sample(0:max_start, n, replace = TRUE)                                                                                                                                                                                                                            
      return( sapply( starts, function(s) full[ 1 + cumsum(c(s, spaces)) %% (N+1)  ] ) )                                                                                                                                                                                          
    }                                                                                                                                                                                                                                                                             
    
    
    > set.seed(1)                                                                                                                                                                                                                                                                 
    
    
    
    > pick_matches(5, refSample, all, wrap = FALSE)                                                                                                                                                                                                                              
          [,1] [,2] [,3] [,4] [,5]                                                                                                                                                                                                                                                
     [1,] "e"  "g"  "j"  "p"  "d"                                                                                                                                                                                                                                                 
     [2,] "g"  "i"  "l"  "r"  "f"                                                                                                                                                                                                                                                 
     [3,] "h"  "j"  "m"  "s"  "g"                                                                                                                                                                                                                                                 
     [4,] "n"  "p"  "s"  "y"  "m"          
    
    > pick_matches(5, refSample, all, wrap = TRUE)                                                                                                                                                                                                                               
          [,1] [,2] [,3] [,4] [,5]                                                                                                                                                                                                                                                
     [1,] "x"  "y"  "r"  "q"  "b"                                                                                                                                                                                                                                                 
     [2,] "z"  "a"  "t"  "s"  "d"                                                                                                                                                                                                                                                 
     [3,] "a"  "b"  "u"  "t"  "e"                                                                                                                                                                                                                                                 
     [4,] "g"  "h"  "a"  "z"  "k"