I was wondering if the following two shufflings of 4 numbers (1:4
) are equally random or one is perhaps preferred to the other in terms of randomness:
sample(rep(1:4, 10))
replicate(10, sample(1:4))
Despite randomness, I need to have equal number of 1s, 2s, 3s, and 4s.
Those functions are not equal in any way.
f1() outputs a vector, f2() outputs a matrix.
As @RicS said, the first returns a vector, the second one returns a matrix.
f1() is almost 50x faster than f2().
The differences in runtime get clearer at a larger scale:
set.seed(1701)
# Functions
f1 <- function() { sample(rep(1:4, 10000)) }
f2 <- function() { c(replicate(10000, sample(1:4))) }
# Benchmark
microbenchmark::microbenchmark(f1(), f2())
Unit: microseconds
expr min lq mean median uq max neval cld
f1() 671.28 820.6755 983.9417 988.7985 1046.476 2320.425 100 a
f2() 40588.03 43241.0270 48796.0141 45612.0740 54431.890 71117.415 100 b
We see that f1()
is clearly faster, exactly as @JosephClarkMcIntyre stated in the comments.
But are they at least equal in their randomness? Let's test that!
f2() is not random.
The Bartels rank test can test a series of numeric numbers for randomness vs. nonrandomness.
> randtests::bartels.rank.test(as.numeric(f1_result$value))
Bartels Ratio Test
data: as.numeric(f1_result$value)
statistic = -1.26, n = 40000, p-value = 0.2077
alternative hypothesis: nonrandomness
The p-value is > 0.05, therefore the null hypothesis was not rejected.
The result of f1()
is not nonrandom. (this is not the same as being sure it is random)
> randtests::bartels.rank.test(as.numeric(f2_result$value))
Bartels Ratio Test
data: as.numeric(f2_result$value)
statistic = 50.017, n = 40000, p-value < 2.2e-16
alternative hypothesis: nonrandomness
The p-value is < 0.05, therefore the null hypothesis was rejected.
The result of f1()
is nonrandom.
This is also evident if you look at the result of the function itself.
> set.seed(1701)
> replicate(10, sample(1:4))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 4 1 3 3 2 3 3 4 1
[2,] 3 1 2 1 4 3 2 2 3 4
[3,] 4 2 3 2 1 1 4 4 2 2
[4,] 2 3 4 4 2 4 1 1 1 3
It produces a matrix with ten columns, each column containing exactly the numbers 1:4. This is not random.