I have the defined the following variables:
a <- as.character(1:10)
b <- 100
c <- 10
probs <- c(0.3, 0.3, 0.3, rep(0.1/7, 7))
min <- 5
max <- 10
I am trying to figure out how to subset the 'probs' argument in the code below to correspond with the characters that are randomly sampled (i.e., randomly sampling characters 5:10)
sample(a[min:max], size = round(b/c), replace = TRUE, prob = probs[???])
I don't think that probs[min:max] will work as it should, but I am uncertain how to find out for certain if it does. A more complicated scenario is if I want something like
a[c(1, 3, 5)]
I would then need 'probs' to correspond to characters 1, 3, and 5.
I have tried using probs[get(paste0(...))], but this is not the most direct and efficient way. It doesn't work anyway.
Any advice is appreciated.
You simply need to subset probs
to be the same subset as a
, e.g.
index = min:max
sample(a[index], size = round(b/c), replace = TRUE, prob = probs[index])
For the more complicated scenario, set index = c(1,3,5)
.
You can see that this works by doing a simulation and comparing the observed probabilities with the true probabilities:
set.seed(123)
tmp = sample(a[min:max], size = 10000, replace = TRUE, prob = probs[min:max])
table(tmp)/10000 # the observed probabilities
The observed probabilities are roughly all equal, which is consistent with the true probabilities being all equal.
If you take a look at the help files for sample
, you'll see that prob
does not need to sum to 1. The function will take care of normalizing the probabilities.