Search code examples
rrandomsamplesampling

Equality of two shuffling codes in R


I was wondering if the following two shufflings of 4 numbers (1:4) are equally random or one is perhaps preferred to the other in terms of randomness:

sample(rep(1:4, 10))

replicate(10, sample(1:4))

Constraint:

Despite randomness, I need to have equal number of 1s, 2s, 3s, and 4s.


Solution

  • Those functions are not equal in any way.


    1. Type

    f1() outputs a vector, f2() outputs a matrix.

    As @RicS said, the first returns a vector, the second one returns a matrix.


    2. Time

    f1() is almost 50x faster than f2().

    The differences in runtime get clearer at a larger scale:

    set.seed(1701)
    
    # Functions
    f1 <- function() { sample(rep(1:4, 10000)) }
    f2 <- function() { c(replicate(10000, sample(1:4))) }
    
    # Benchmark
    microbenchmark::microbenchmark(f1(), f2())
    Unit: microseconds
     expr      min         lq       mean     median        uq       max neval cld
     f1()   671.28   820.6755   983.9417   988.7985  1046.476  2320.425   100  a 
     f2() 40588.03 43241.0270 48796.0141 45612.0740 54431.890 71117.415   100   b
    

    We see that f1() is clearly faster, exactly as @JosephClarkMcIntyre stated in the comments.

    But are they at least equal in their randomness? Let's test that!


    3. Randomness

    f2() is not random.

    The Bartels rank test can test a series of numeric numbers for randomness vs. nonrandomness.

    > randtests::bartels.rank.test(as.numeric(f1_result$value))
    
        Bartels Ratio Test
    
    data:  as.numeric(f1_result$value)
    statistic = -1.26, n = 40000, p-value = 0.2077
    alternative hypothesis: nonrandomness
    

    The p-value is > 0.05, therefore the null hypothesis was not rejected.
    The result of f1() is not nonrandom. (this is not the same as being sure it is random)

    > randtests::bartels.rank.test(as.numeric(f2_result$value))
    
        Bartels Ratio Test
    
    data:  as.numeric(f2_result$value)
    statistic = 50.017, n = 40000, p-value < 2.2e-16
    alternative hypothesis: nonrandomness
    

    The p-value is < 0.05, therefore the null hypothesis was rejected.
    The result of f1() is nonrandom.

    This is also evident if you look at the result of the function itself.

    > set.seed(1701)
    > replicate(10, sample(1:4))
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    [1,]    1    4    1    3    3    2    3    3    4     1
    [2,]    3    1    2    1    4    3    2    2    3     4
    [3,]    4    2    3    2    1    1    4    4    2     2
    [4,]    2    3    4    4    2    4    1    1    1     3
    

    It produces a matrix with ten columns, each column containing exactly the numbers 1:4. This is not random.