Search code examples
rdice

Estimate the chance n rolls of m fair six-sided dice


Similar with De mere problem

I want to generate a Monte Carlo simulation to estimate the probability of rolling at least one 1 from n rolls of m fair six-sided dice.

My code:

m<-5000
n<-3
x<-replicate(m, sample(1:6,n,TRUE)==1)
p<-sum(x)/m

p is the probability estimated. Here I get the value 0.4822.

My questions:

1) Is there any other way without using sum to do it?

2) I doubt the code is wrong as the probability maybe too high.


Solution

  • Although the question as stated is a little unclear, the code suggests you want to estimate the chance of obtaining at least one outcome of "1" among n independent dice and that you aim to estimate this by simulating the experiment m times.

    Program simulations from the inside out. Begin with a single iteration. You started well, but to be perfectly clear let's redo it using a highly suggestive syntax. Try this:

    1 %in% sample(1:6,n,TRUE)
    

    This uses sample to realize the results of n independent fair dice and checks whether the outcome 1 appears among any of them.

    Once you are satisfied that this emulates your experiment (run it a bunch of times), then indeed replicate will perform the simulation:

    x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
    

    That produces m results. Each will be TRUE (interpreted as equal to 1) in all iterations where 1 appeared and otherwise will be FALSE (interpreted as 0). Consequently, the average number of times 1 appeared can be obtained as

    mean(x)
    

    This empirical frequency is a good estimate of the theoretical probability.


    As a check, note that 1 will not appear on a single die with a probability of 1-1/6 = 5/6 and therefore--because the n dice are independent--will not appear on any of them with a probability of (5/6)^n. Consequently the chance a 1 will appear must be 1 - (5/6)^n. Let us output those two values: the simulation mean and theoretical result. We might also include a Z score, which is a measure of how far away from the theoretical result the mean is. Typically, Z scores between -2 and 2 aren't significant evidence of any discrepancy.

    Here's the full code. Although there are faster ways to write it, this is very fast already and is about as clear as one could make it.

    m <- 5000     # Number of simulation iterations
    n <- 3        # Number of dice per iteration 
    set.seed(17)  # For reproducible results
    x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
    
    # Compare to a theoretical result.
    theory <- 1-(5/6)^n
    avg <- mean(x)
    Z <- (avg - theory) / sd(x) * sqrt(length(x))
    c(Mean=signif(avg, 5), Theoretical=signif(theory, 5), Z.score=signif(Z, 3))
    

    The output is

    Mean Theoretical Z.score

    0.4132 0.4213 -1.1600

    Notice that neither result is anywhere near n/6, which would be 1/2 = 0.500.