Consider the following when performing random sampling in R:
n <- 10
k <- 10
p <- 0.10 # proportion of the k objects to subsample
probs <- c(0.30, 0.30, 0.30, rep(0.10/7, 7)) # probabilities for each of the k objects
Here, the roles of n and k are irrelevant; however, there is the condition that n >= k.
x <- sort(sample(k, size = ceiling(p * k), replace = FALSE)) # works
y <- sample(x, size = n, replace = TRUE, prob = probs[x]) # throws error
I am wondering why the function call assigned to y above throws an error.
The error I receive is:
Error in sample.int(x, size, replace, prob) :
incorrect number of probabilities
My thinking is that the 'size' argument to sample() (i.e., n*p) cannot evaluate to 1 in the second function call (y variable), but I haven't been able to find anything documenting this error in the help files to sample().
I know that ceiling() can act strangely in some instances, but I'm not convinced that this could be the issue.
When the above code is run, x is set to the integer data type, e.g., 1L, 2L, etc., which leads to the error in evaluating y.
Does someone have an idea on how to fix this issue?
If x
is a single value, sample(x)
samples from values 1 through x
(see the Details
section of the help), or 1 through floor(x)
if x
isn't an integer. So the prob
argument has to be a vector of length x
. In your code probs[x]
is always a vector of length 1, which causes the error.