Search code examples
rrandomsimulationsample

unexpected sample behavior


I am currently running a small simulation and am irritated by the results. This is my code:

ground_truth <- c("coke", "zero", "light", "zero")
options <- c("zero", "coke", "light")
random_guess<- lapply(1:10000, function(x){
  first <- sample(options, 1)
  rest <- sample(options, 3)
  return(c(first, rest))
})
final_res <- lapply(random_guess, function(k) {
    sum(k == ground_truth)
}) %>% unlist()
table(final_res)

I noticed that no matter how often I generate random guesses, the code never returns a perfect match with the ground truth which seemes unlikely (impossible). I made a small adjustment to the code and changed return(c(first,rest)) return(c(rest, first)) and now i do get from time to time a full match with the ground truth. However this really confuses me since I thought everything is picked randomly anyway. Doing the following change return(sample(c(first,rest))) also returns the correct results.

Any insights into why this is happening are very much appreciated!


Solution

  • Your ground truth is

    "coke" then "zero", "light", "zero"
    

    Your random guess chooses one of "zero", "coke", "light" and then it randomly shuffles those values. So your last three values is come permutation of those three. But you have "zero" twice in your "ground truth". sample by default will never pick a value twice. Did you want to sample with replacement? If so, add rest <- sample(options, 3, replace=TRUE) but that means you may have something like "coke", "zero", "zero", "zero". So see if that makes sense with your simulation. It's not clear exactly what type of process you are trying to model. If you change to return(c(rest, first)) that means there will at least be a chance that you will get zero twice in the last three which can never happen just by shuffling options

    Your random guesses are not that random because the last three values will never be duplicated. In order to mix things up, try something like this instead

    random_guess <- lapply(1:10000, function(x){
        dup_value <- sample(options,1)
        # add dup with options and shuffle
        sample(c(dup_value, options))
    })