Search code examples
rrandomgaussiannormal-distribution

mvrnorm (from MASS) vs rmvnorm (from mvtnorm)


I am generating a large volume of data from the multivariate Normal distribution for simulation. I wonder if anyone is aware of which command is most efficient for this. If it is the mvrnorm (from the "MASS" package) or the rmvnorm (from the "mvtnorm" package).


Solution

  • Such questions can be easily answered by timing different approaches. Let

    library(microbenchmark)
    library(MASS)
    library(mvtnorm)
    
    n <- 10000
    k <- 50
    mu <- rep(0, k)
    rho <- 0.2
    Sigma <- diag(k) * (1 - rho) + rho 
    

    In this way we have 50 variables with unit variance and correlation 0.2. Generating 10000 observations we get

    microbenchmark(mvrnorm(n, mu = mu, Sigma = Sigma),
                   rmvnorm(n, mean = mu, sigma = Sigma, method = "eigen"),
                   rmvnorm(n, mean = mu, sigma = Sigma, method = "svd"),
                   rmvnorm(n, mean = mu, sigma = Sigma, method = "chol"),
                   times = 100)
    # Unit: milliseconds
    #                                                    expr      min       lq     mean   median        uq      max neval cld
    #                      mvrnorm(n, mu = mu, Sigma = Sigma) 65.04667 73.02912 85.30384 81.70611  92.69137 148.6959   100  a 
    #  rmvnorm(n, mean = mu, sigma = Sigma, method = "eigen") 71.14170 81.08311 95.12891 88.84669 100.62174 237.0012   100   b
    #    rmvnorm(n, mean = mu, sigma = Sigma, method = "svd") 71.32999 81.30640 93.40939 88.54804 104.00281 208.3690   100   b
    #   rmvnorm(n, mean = mu, sigma = Sigma, method = "chol") 71.22712 78.59898 94.13958 89.04653 108.27363 158.7890   100   b
    

    Thus, possibly mvrnorm performs slightly better. As you have a specific application in mind, you should set n, k, and Sigma to values more adequate to this application.

    As you don't seem to be restricted to those two approaches, you could look into Rcpp alternatives; see, e.g., 1, 2, 3.