I am having the following issue. I am trying to integrate a pdf function of bivariate normal over the entire support. However, Matlab returns 0 for d, which shouldn't be the case. What is wrong with my approach?
The code is below:
mu1=100;
mu2=500;
sigma1=1;
sigma2=2;
rho=0.5;
var1=sigma1^2;
var2=sigma2^2;
pdf = @(x,y) (1/(2*pi*sigma1*sigma2*sqrt(1-rho^2))).*exp((-1/(2*(1-rho^2)))*(((x-mu1).^2/var1)+((y-mu2).^2/var2)-((2*rho*(x-mu1).*(y-mu2))/(sigma1*sigma2))));
d = integral2(pdf,-Inf,Inf,-Inf,Inf)
As @Andras Deak commented, the "exponentials cut off very fast away from the peak". As a matter of fact, you can visualize it:
mu1=100;
mu2=500;
sigma1=1;
sigma2=2;
rho=0.5;
var1=sigma1^2;
var2=sigma2^2;
pdf = @(x,y) (1/(2*pi*sigma1*sigma2*sqrt(1-rho^2))).*exp((-1/(2*(1-rho^2)))*(((x-mu1).^2/var1)+((y-mu2).^2/var2)-((2*rho*(x-mu1).*(y-mu2))/(sigma1*sigma2))));
figure
fsurf(pdf,[90 110 490 510])
figure
fsurf(pdf,[0 200 400 600])
In the first figure, the limits are close to the means you provided. You can see the shape of the bivariate normal:
If you extend the limits, you will see what it looks like a discontinuity:
The built-in integral functions try to evaluate the integrals, but if your limits are -inf
and inf
, your function is zero almost everywhere, with a discontinuity close to the means.
To treat singularities, you should break your domain, as suggested by MATLAB. Since the function is zero almost everywhere, you can integrate only around the means:
d = integral2(pdf,90,110,490,510)
> d =
>
> 1.0000
You can also write it as a function of your variables. The empirical rule states that 99.7% of your data is within 3 standard deviations from the means, so:
d = integral2(pdf,mu1-3*sigma1,mu1+3*sigma1,mu2-3*sigma2,mu2+3*sigma2)
> d =
>
> 0.9948
which will get you a pretty good result. We can elaborate more. In the wikipedia page of the empirical rule, the expression
erf(x/sqrt(2))
will give the "Expected fraction of population inside range mu+-x*sigma
". For the short precision shown as standard in MATLAB, if you choose, say x=5, you will get:
x = 5;
erf(x/sqrt(2))
> ans =
>
> 1.0000
Pretty much every data is contained within 5 standard deviations. So, you may neglect the domain outside this range in the double integration to avoid the (almost) singularity.
d = integral2(pdf,mu1-x*sigma1,mu1+x*sigma1,mu2-x*sigma2,mu2+x*sigma2)
> d =
>
> 1.0000