Search code examples
rnormal-distributionprobability-density

Two calculation formulas of density (pdf) of a bivariate normal distribution returning different results


With the code I’m calculating the density of a bivariate normal distribution. Here I use two formulas which should return the same result.

The first formula uses the dmvnorm of the mvtnorm package and the second formula uses the formula from Wikipedia (https://en.wikipedia.org/wiki/Multivariate_normal_distribution).

When the standard deviation of both distributions equals one (the covariance matrix has only ones on primary diagonal), the results are the same. But when you vary the two entries in the covariance matrix to two or one third… the results aren’t both identical.

(I hope) I have read the help properly and also this document (https://cran.r-project.org/web/packages/mvtnorm/vignettes/MVT_Rnews.pdf).

Here on stackoverflow (How to calculate multivariate normal distribution function in R) I found this because perhaps my covariance matrix is wrong defined.

But until now I couldn’t find an answer…

So my question: Why is my code returning different results when the standard deviation not equals one?

I hope I gave enough information... but when something is missing please comment. I will edit my question.

Many thanks in advance!

And now my code:

   library(mvtnorm)  # for loading the package if necessary

    mu=c(0,0)
    rho=0
    sigma=c(1,1)  # the standard deviation which should be changed to two or one third or… to see the different results
    S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)

    x=rmvnorm(n=100,mean=mu,sigma=S)
    dim(x)  # for control
    x[1:5,]  # for visualization

    # defining a function
    Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
    for (i in 1:quantity) {
           print(paste0("The ",i," random point"))
           print(Points[i,])
           print("The following two results should be the same")
           print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
           print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
           print("Result from equation out of wikipedia")
           print(1/(2*pi*S[1,1]*S[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/S[1,1]^2+Points[i,2]^2/S[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(S[1,1]*S[2,2]))))
           print("----")
           print("----")
    } # end for-loop     
    } # end function

    # execute the function and compare the results
    Comparison(Points=x,mean=mu,sigma=S,quantity=4)

Solution

  • Remember that S is the variance-covariance matrix. The formula you use from Wikipedia uses the standard deviation and not the variance. Hence you need to plug in the square root of the diagonal entries into the formula. This is also the reason why it works when you choose 1 as the diagonal entries (both the variance and the SD is 1).

    See your modified code below:

     library(mvtnorm)  # for loading the package if necessary
    
     mu=c(0,0)
     rho=0
     sigma=c(2,1)  # the standard deviation which should be changed to two or one      third or… to see the different results
     S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)
    
     x=rmvnorm(n=100,mean=mu,sigma=S)
     dim(x)  # for control
     x[1:5,]  # for visualization
    
     # defining a function
     Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
       for (i in 1:quantity) {
         print(paste0("The ",i," random point"))
         print(Points[i,])
         print("The following two results should be the same")
         print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
         print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
         print("Result from equation out of wikipedia")
         SS <- sqrt(S)
         print(1/(2*pi*SS[1,1]*SS[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/SS[1,1]^2+Points[i,2]^2/SS[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(SS[1,1]*SS[2,2]))))
         print("----")
        print("----")
      } # end for-loop     
    } # end function
    
    # execute the function and compare the results
    Comparison(Points=x,mean=mu,sigma=S,quantity=4)
    

    So your comment when you define sigma is not correct. In your code, sigma is the variances, not the standard deviations if you judge by how you construct S.