Search code examples
rsubsetsample

Extracting character probabilities that were randomly sampled in R


I have the defined the following variables:

a <- as.character(1:10)
b <- 100
c <- 10
probs <- c(0.3, 0.3, 0.3, rep(0.1/7, 7))
min <- 5
max <- 10

I am trying to figure out how to subset the 'probs' argument in the code below to correspond with the characters that are randomly sampled (i.e., randomly sampling characters 5:10)

sample(a[min:max], size = round(b/c), replace = TRUE, prob = probs[???])

I don't think that probs[min:max] will work as it should, but I am uncertain how to find out for certain if it does. A more complicated scenario is if I want something like

a[c(1, 3, 5)]

I would then need 'probs' to correspond to characters 1, 3, and 5.

I have tried using probs[get(paste0(...))], but this is not the most direct and efficient way. It doesn't work anyway.

Any advice is appreciated.


Solution

  • You simply need to subset probs to be the same subset as a, e.g.

    index = min:max
    sample(a[index], size = round(b/c), replace = TRUE, prob = probs[index])
    

    For the more complicated scenario, set index = c(1,3,5).

    You can see that this works by doing a simulation and comparing the observed probabilities with the true probabilities:

    set.seed(123)
    tmp = sample(a[min:max], size = 10000, replace = TRUE, prob = probs[min:max])
    table(tmp)/10000 # the observed probabilities
    

    The observed probabilities are roughly all equal, which is consistent with the true probabilities being all equal.

    If you take a look at the help files for sample, you'll see that prob does not need to sum to 1. The function will take care of normalizing the probabilities.