Search code examples
rrandomsimulationmultinomial

Generating multinomial random data in R


I am trying to generate data from a multinomial distribution in R using the function rmultinom, but I am having some problems. The fact is that I want a data frame of 50 rows and 20 columns and a total sum of the outcomes equal to 3 times n*p.

I am using this code:

p <- 20
n <- 50
N <- 3*(n*p)
prob_true <- rep(1/p, p)
a <- rmultinom(50, N, prob_true)

But I get some very strange results and a data frame with 20 rows and 50 columns. How can I solve this problem?

Thanks in advance!


Solution

  • The help available at ?rmultinom says that n in rmultinom(n, size, prob) is:

    "number of random vectors to draw"

    And size is: "specifying the total number of objects that are put into K boxes in the typical multinomial experiment"

    And the help says that the output is: "For rmultinom(), an integer K x n matrix where each column is a random vector generated according to the desired multinomial law, and hence summing to size"

    So you're asking for 50 vectors/variables with a total number of "objects" equal to 3000, so each column is drawn as a vector that sums to 3000.

    colSums(a) does result in 3000.

    Do you want your vectors/variables as rows? Then this would work just by transposing a:

    t(a)
    

    but if you want 20 columns, each that is its own variable, you would need to switch your n and p (I also subbed in n in the rmultinom call):

    n <- 20
    p <- 50
    N <- 3*(n*p)
    prob_true <- rep(1/p, p)
    a <- rmultinom(n, N, prob_true)