I am visiting a restaurant that has a menu with N dishes. Every time that I visit the restaurant I pick one dish at random. I am thinking, what is the average time until I taste all the N dishes in the restaurant?
I think that the number of dishes that I have tasted after n visits in the restaurant is a Markov chain with transition probabilities:
p_{k,k+1} = (N-k)/N
and
p_{k,k} = k/N
for k =0,1,2,...,N
I want to simulate this process in R. Doing so (I need help here) given that the restaurant has 100 dishes I did:
nits <- 1000 #simulate the problem 1000 times
count <- 0
N = 100 # number of dishes
for (i in 1:nits){
x <- 1:N
while(length(x) > 0){
x <- x[x != sample(x=x,size=1)] # pick one dish at random that I have not tasted
count <- count + 1/nits
}
}
count
I want some help because my mathematical result is the the average time is N*log(N) and the code above produces different results.
You have 2 issues.
i
, but don't use i
inside the loop. Set up a structure to hold the results of every iteration:results = integer(length = nits)
...
for (i in 1:nits){
...
while(length(x) > 0){
...
}
results[i] <- count
}
pick one dish at random
Your code says
pick one dish at random that I have not tasted
If you always pick a dish you have not tasted, then the problem is trivial - it will take N
visits. Let's adjust your code to pick on dish at random whether you have tasted it or not:
nits <- 1000 #simulate the problem 1000 times
results = integer(length = nits)
N = 100 # number of dishes
for (i in 1:nits){
dishes = 1:N
tasted = rep(0, N)
count = 0
while(sum(tasted) < N){
tasted[sample(dishes, size = 1)] = 1
count = count + 1
}
results[i] = count
}
results
Looking at the results, I think you may have made a math error:
100 * log(100)
# [1] 460.517
mean(results)
# [1] 518.302
You can read more about this problem on Wikipedia: Coupon Collector's Problem. Using the result there, the simulation is doing quite well:
100 * log(100) + .577 * 100 + 0.5
# [1] 518.717