I would like to model a sampling with replacement in R (like the urn model). That is, I have three different events (say: 1,2 and 3 (in fact they a categorical but I think this is not important at the moment)) and I know the probability of each event:
1 --> 0.5
2 --> 0.2
3 --> 0.3
Now I would like to take for example 50 samples with replacement and I would like to know the probabilities for each possible combination of the three different events.
My idea was to use rmultinom
to generate these samples.
rmultinom(n=50,size=3,prob=c(0.5,0.2,0.3))
Now I get 50 randomly (?) Chosen samples, but I Need all possible combinations when I take 50 samples with replacement.
If I'm understanding you correctly, the probabilities you're after can be calculated analytically.
My sense is that you want to treat all draws with the same numbers of 1s, 2s, and 3s as equivalent (see below if not). That is, 49 straight 1s followed by a 2 counts as the same 50-draw "outcome" as a 2 followed by 49 straight 1s.
In that case, what you're looking for is the value of the multinomial probability mass function evaluated for (p1 = 0.5, p2 = 0.2, p3 = 0.3) and at counts c1, c2, and c3, of 1s, 2s, and 3s (these should sum to 50). You can evaluate the multinomial PMF in R as:
counts = c(c1, c2, c3)
myProbs = c(0.5, 0.2, 0.3)
dmultinom(x = counts, prob = myProbs)
Now all that remains is to enumerate the all the possible combinations of 1s, 2s, and 3s that can occur in 50 draws. Calling the function nsimplex(3,50)
(from the combinat
package) tells us that there are 1326 of these, and calling the function xsimplex(3,50)
(found in the same package) will generate them for us in matrix form. Here are the first five of the 1326 columns:
[,1] [,2] [,3] [,4] [,5]
[1,] 50 49 49 48 48
[2,] 0 1 0 2 1
[3,] 0 0 1 0 1
We then simply need to evaluate dmultinom for each column, using apply
column-wise:
mySimplex = xsimplex(3, 50)
myProbs = c(0.5, 0.2, 0.3)
results = apply(mySimplex, 2, dmultinom, prob = myProbs)
The nth entry in the vector results
will be the probability of the counts in the nth column of the matrix mySimplex
.
Was this what you were after?
Distinct permutations: If you want to count distinct permutations differently, then the probability of any individual permutation is just:
0.5^(c_1) * 0.2^(c_2) * 0.3^(c_3)
where c_1
is the number of 1s, c_2
the number of 2s, and c_3
the number of threes in the draw. But if you're looking to enumerate all of them, you might want to think again! The number of possible unique length-50 sequences in which each character is a 1, a 2, or a 3 is 3^50 > 10^23.