Search code examples
rconditional-statementssamplingmontecarlo

How to draw from sample until at least one of each sample value is obtained and then stop


I am working on a homework problem where a cereal company is running a promotion where 1 of 4 free toys is included in each box. The goal is to collect all 4 toys through buying one box at a time.

There are two scenarios:

1.Each toy is equally likely

2.The toys have selection probabilities 0.10, 0.25, 0.25, and 0.40.

I want to design a function that takes each toy number and each toy's selection probability as inputs, simulates buying boxes of cereal by sampling from the toy numbers until at least one of each toy is obtained, then stops and reports how many boxes were purchased.

The end goal is to use this function in a Monte Carlo simulation study to find out on average how many boxes consumers will have to buy to collect all the toys, and the proportion of consumers that will have to buy at least 14 boxes to collect all the toys.

I've tried to create a loop (while, repeat) which samples until a vector contains all the toy values but the loops have run infinitely. I suspect there is a problem with the condition I am feeding the loop.

box_buyer <- function (purchase_options, probabilities) {

  boxes <- numeric()

  while (!purchase_options %in% boxes) {

    append(boxes, sample(purchase_options, 1, probabilities, replace = TRUE))

  }

  return(length(boxes))

}

box_buyer(c(1, 2, 3, 4), c(1/4, 1/4, 1/4, 1/4))

I am expecting a function that returns the number of boxes that were purchased. Currently what I get is an infinite loop that gives the error: "the condition has length > 1 and only the first element will be used" repeated until I terminate R.

How can I get the loop to sample until at least one of each toy is obtained and then stop and return the number of boxes purchased? Any help is appreciated.


Solution

  • I don't think this loop will be very efficient, because you need to restart the random number generator every time it steps through the loop. I would suggest, instead, taking a large sample of c('A', 'B', 'C', 'D') with the desired probabilities, then seeing when you first got each element, and taking the max of those. This could fail if you happened not to draw any of one type, but with a large enough sample that becomes vanishingly improbable. Here's an implementation of my suggestion.

    set.seed(1)
    
    probs <- c(.1, .25, .25, .40)
    samp <- sample(c('A', 'B', 'C', 'D'), size = 1000, replace = TRUE, prob = probs)
    
    max(c(min(which(samp == 'A')), min(which(samp == 'B')), min(which(samp == 'C')), min(which(samp == 'D'))))
    [1] 6
    samp[1:6] # we get the final missing item on draw 6
    [1] "D" "D" "C" "A" "D" "B"