I am trying to generate data from a multinomial distribution in R using the function rmultinom, but I am having some problems. The fact is that I want a data frame of 50 rows and 20 columns and a total sum of the outcomes equal to 3 times n*p.
I am using this code:
p <- 20
n <- 50
N <- 3*(n*p)
prob_true <- rep(1/p, p)
a <- rmultinom(50, N, prob_true)
But I get some very strange results and a data frame with 20 rows and 50 columns. How can I solve this problem?
Thanks in advance!
The help available at ?rmultinom
says that n
in rmultinom(n, size, prob)
is:
"number of random vectors to draw"
And size
is:
"specifying the total number of objects that are put into K boxes in the typical multinomial experiment"
And the help says that the output is: "For rmultinom(), an integer K x n matrix where each column is a random vector generated according to the desired multinomial law, and hence summing to size"
So you're asking for 50 vectors/variables with a total number of "objects" equal to 3000, so each column is drawn as a vector that sums to 3000.
colSums(a)
does result in 3000
.
Do you want your vectors/variables as rows? Then this would work just by transposing a
:
t(a)
but if you want 20 columns, each that is its own variable, you would need to switch your n
and p
(I also subbed in n
in the rmultinom
call):
n <- 20
p <- 50
N <- 3*(n*p)
prob_true <- rep(1/p, p)
a <- rmultinom(n, N, prob_true)