Similar with De mere problem
I want to generate a Monte Carlo simulation to estimate the probability of rolling at least one from
n
rolls of m
fair six-sided dice.
My code:
m<-5000
n<-3
x<-replicate(m, sample(1:6,n,TRUE)==1)
p<-sum(x)/m
p is the probability estimated. Here I get the value 0.4822.
My questions:
1) Is there any other way without using sum
to do it?
2) I doubt the code is wrong as the probability maybe too high.
Although the question as stated is a little unclear, the code suggests you want to estimate the chance of obtaining at least one outcome of "1" among n
independent dice and that you aim to estimate this by simulating the experiment m
times.
Program simulations from the inside out. Begin with a single iteration. You started well, but to be perfectly clear let's redo it using a highly suggestive syntax. Try this:
1 %in% sample(1:6,n,TRUE)
This uses sample
to realize the results of n
independent fair dice and checks whether the outcome 1
appears among any of them.
Once you are satisfied that this emulates your experiment (run it a bunch of times), then indeed replicate
will perform the simulation:
x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
That produces m
results. Each will be TRUE (interpreted as equal to 1) in all iterations where 1
appeared and otherwise will be FALSE (interpreted as 0). Consequently, the average number of times 1
appeared can be obtained as
mean(x)
This empirical frequency is a good estimate of the theoretical probability.
As a check, note that 1
will not appear on a single die with a probability of 1-1/6 = 5/6 and therefore--because the n
dice are independent--will not appear on any of them with a probability of (5/6)^n. Consequently the chance a 1
will appear must be 1 - (5/6)^n. Let us output those two values: the simulation mean and theoretical result. We might also include a Z score, which is a measure of how far away from the theoretical result the mean is. Typically, Z scores between -2 and 2 aren't significant evidence of any discrepancy.
Here's the full code. Although there are faster ways to write it, this is very fast already and is about as clear as one could make it.
m <- 5000 # Number of simulation iterations
n <- 3 # Number of dice per iteration
set.seed(17) # For reproducible results
x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
# Compare to a theoretical result.
theory <- 1-(5/6)^n
avg <- mean(x)
Z <- (avg - theory) / sd(x) * sqrt(length(x))
c(Mean=signif(avg, 5), Theoretical=signif(theory, 5), Z.score=signif(Z, 3))
The output is
Mean Theoretical Z.score
0.4132 0.4213 -1.1600
Notice that neither result is anywhere near n/6
, which would be 1/2 = 0.500.