Search code examples
numpyprobabilitynumerical-methodsmontecarlonumerical-integration

Custom Python Monte Carlo integration function is underestimating multi-dimensional integrals


I need to create a custom Monte Carlo integration function to adapt to custom multi-dimensional distribution objects using NumPy. I need it to integrate over the same values in each dimension. It works correctly for a single dimension, but underestimates in multiple dimensions, which gets worse with higher dimensions. I am using this paper (equation 5) as a guide. Is my equation of Volume * Mean Density incorrect? Is my sampling method incorrect? I'm really at a loss for what the error is.

import numpy as np
from scipy.stats import multivariate_normal

# Set up distribution parameters.
dim = 3
loc = np.repeat(0., repeats=dim)
scale = np.repeat(1., repeats=dim)

# Initialize a multivariate normal distribution.
mvn = multivariate_normal(mean=loc, cov=scale)

def mc_integrator(distribution, dim, support, size=1000, seed=0):
    """
    Parameters
    ----------
    distribution : function
        A probability density function.
    dim : int
        The number of dimensions of the distribution.
    support : list
        List of the low and high values of the hypercube to integrate over.
    size : int, optional
        Number of samples used for estimation. The default is 1000.
    seed : int, optional
        A random seed for reproducibility. The default is 0.

    Returns
    -------
    float
        The estimate of the integral over the hypercube.
    """
    # Set the random seed.
    np.random.seed(seed)
    # Separate the elements of the support.
    a, b = support[0], support[1]
    # Calculate the volume of the hypercube.
    volume = (b-a)**dim
    # Generate random samples of the appropriate shape.
    samples = np.random.uniform(low=a, high=b, size=(size,dim))
    # Return the estimate of the integral.
    return volume*np.mean(distribution(samples))

# Set the number of samples to use for estimation.
size = 10000
# Set the low and high value over each dimension of the hypercube.
support = [-2, 2]
# Print the estimate of the integral.
print(mc_integrator(mvn.pdf, dim, support, size=size))
# Print the exact value of the integral.
print(mvn.cdf(np.repeat(support[1], dim))-mvn.cdf(np.repeat(support[0], dim)))

Output: 0.8523870204938726
        0.9332787601629401

Solution

  • John, looks good overall but it looks to me that you're figuring the expected result incorrectly. I think the expected result should be (F(2) - F(-2)^3 where F is the Gaussian cdf for mean 0 and variance 1. For F(2) - F(-2), I get erf(sqrt(2)) which is approximately 0.9545, and then (F(2) - F(-2))^3 is 0.8696, which agrees pretty well with your results.

    I don't know what mvn.cdf is supposed to return, but the concept of "cdf" is a little fishy in more than one dimension, so maybe you can steer away from that.

    About multidimensional integration in general, you mention using Halton sequences. I think that's an interesting idea too. My experience with computing integrals is to use quadrature rules in 1 or 2 dimensions, low-discrepancy sequences in 3 to several (5? 7? I dunno), and MC in more than that. Oh, and also my advice is to work pretty hard for exact results before resorting to numerical approximations.

    I would be interested to hear about what you're working on.