I am trying to debug a short program, and I get a disconcerting result towards the end of sampling from the elements of a vector under some conditions. It happens as the elements of the vector that remain draw down to a single value.
In the specific case I'm referring to the vector is called remaining
and contains a single element, the number 2
. I would expect that any sampling of size 1 from this vector would stubbornly return 2
, since 2 is the only element in the vector, but this is not the case:
Browse[2]> is.vector(remaining)
[1] TRUE
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 2
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
As you can see, sometimes the return is 1
and some others, 2
.
What am I misunderstanding about the function sample()
?
From help("sample")
:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.
So, when you have remaining = 2
, then sample(remaining)
is equivalent to sample(x = 1:2)
From the comments it's clear you are also looking for a way around this behavior. Here is a benchmark comparison of three mentioned alternatives:
library(microbenchmark)
# if remaining is of length one
remaining <- 2
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: nanoseconds
expr min lq mean median uq max neval cld
a 349 489 625.12 628.0 663.5 3283 100 a
b 1536 1886 2240.58 2025.0 2165.5 13898 100 b
c 4051 4400 5193.41 4679.5 5064.0 38413 100 c
# If remaining is not of length one
remaining <- 1:10
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: microseconds
expr min lq mean median uq max neval cld
a 5.238 5.7970 6.82703 6.251 6.9145 51.264 100 a
b 11.663 12.2920 13.14831 12.851 13.3745 34.851 100 b
c 5.238 5.9715 6.57140 6.426 6.8450 14.667 100 a
It looks like the suggestion from joran may be the fastest in your case if sample()
is called much more often when remaining
is of length > 1, and the if() {} else {}
approach would be faster otherwise.