Search code examples
rreplaceoverlapsample

R: take 2 random non-overlapping samples (for same indexes) of length n out of vector of length n as well


Say I have a vector named all_combinations with numbers from 1 to 20.

I need to extract 2 vectors (coding_1 and coding_2) of length equal to number_of_peptide_clusters, which happens to be 20 as well in my current case.

The 2 new vectors should be randomly sampled from all_combinations, so that are not overlapping at each index position.

I do the following:

set.seed(3)
all_combinations=1:20
number_of_peptide_clusters=20
coding_1 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
coding_1
 [1]  5 12  7  4 10  8 11 15 17 16 18 13  9 20  2 14 19  1  3  6
coding_2 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
coding_2
 [1]  5  9 19 16 18 12  8  6 15  3 13 14  7  2 11 20 10  4 17  1

This is the example that gives me trouble, cause only one number is overlapping at the same index (5 at position 1).

What I would do in these cases is spot the overlapping numbers and resample them out of the list of all overlapping numbers...

Imagine coding_1 and coding_2 were:

coding_1
 [1]  5 9  7  4 10  8 11 15 17 16 18 13  12 20 2  14 19  1  3  6
coding_2
 [1]  5 9 19 16 18 12  8  6 15  3 13 14  7  2  11 20 10  4 17  1

In this case I would have 5 and 9 overlapping in the same position, so I would resample them in coding_2 out of the full list of overlapping ones [resample index 1 from c(5,9) so that isn't equal to 5, and index 2 so it isn't equal to 9]. So coding_2 would be:

coding_2
 [1]  9 5 19 16 18 12  8  6 15  3 13 14  7  2  11 20 10  4 17  1

However, in the particular case above, I cannot use such approach... So what would be the best way to obtain 2 samples of length 20 from a vector of length 20 as well, so that the samples aren't overlapping at the same index positions?

It would be great that I could obtain the second sample coding_2 already knowing coding_1... Otherwise obtaining the 2 at the same time would also be acceptable if it makes things easier. Thanks!


Solution

  • I think the best solution is simply to use a rejection strategy:

    set.seed(3)
    all_combinations <- 1:20
    number_of_peptide_clusters <- 20
    count <- 0
    repeat {
      count <- count + 1
      message("Try number ", count)
      coding_1 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
      coding_2 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
      if (!any(coding_1 == coding_2))
        break
    }
    #> Try number 1
    #> Try number 2
    #> Try number 3
    #> Try number 4
    #> Try number 5
    #> Try number 6
    #> Try number 7
    #> Try number 8
    #> Try number 9
    coding_1
    #>  [1] 18 16 17 12 13  8  6 15  3  5 20  9 11  4 19  2 14  7  1 10
    coding_2
    #>  [1]  5 20 14  2 11  6  7 10 19  8  4  1 15  9 13 17 18 16 12  3
    

    Created on 2020-11-04 by the reprex package (v0.3.0)