Search code examples
rsample

Random sampling with sample() gives unexpected results


Consider the following when performing random sampling in R:

n <- 10
k <- 10 
p <- 0.10 # proportion of the k objects to subsample
probs <- c(0.30, 0.30, 0.30, rep(0.10/7, 7)) # probabilities for each of the k objects

Here, the roles of n and k are irrelevant; however, there is the condition that n >= k.

x <- sort(sample(k, size = ceiling(p * k), replace = FALSE)) # works
y <- sample(x, size = n, replace = TRUE, prob = probs[x]) # throws error

I am wondering why the function call assigned to y above throws an error.

The error I receive is:

Error in sample.int(x, size, replace, prob) : 
incorrect number of probabilities

My thinking is that the 'size' argument to sample() (i.e., n*p) cannot evaluate to 1 in the second function call (y variable), but I haven't been able to find anything documenting this error in the help files to sample().

I know that ceiling() can act strangely in some instances, but I'm not convinced that this could be the issue.

When the above code is run, x is set to the integer data type, e.g., 1L, 2L, etc., which leads to the error in evaluating y.

Does someone have an idea on how to fix this issue?


Solution

  • If x is a single value, sample(x) samples from values 1 through x (see the Details section of the help), or 1 through floor(x) if x isn't an integer. So the prob argument has to be a vector of length x. In your code probs[x] is always a vector of length 1, which causes the error.