Search code examples
pythonmathpercentagemontecarlo

How do I find the percentage error in a Monte Carlo algorithm?


I have written a Monte Carlo program to integrate a function f(x).

I have now been asked to calculate the percentage error.

Having done a quick literature search, I found that this can be given with the equation %error = (sqrt(var[f(x)]/n))*100, where n is the number of random points I used to derive my answer.

However, when I run my integration code, my percentage error is greater than that given by this formula.

Do I have the correct formula?

Any help would be greatly appreciated. Thanks x


Solution

  • Here is quick example - estimate integral of linear function on the interval [0...1] using Monte-Carlo. To estimate error you have to collect second momentum (values squared), then compute variance, standard deviation, and (assuming CLT), error of the simulation in the original units as well as in %

    Code, Python 3.7, Anaconda, Win10 64x

    import numpy as np
    
    def f(x): # linear function to integrate
        return x
    
    np.random.seed(312345)
    
    N = 100000
    
    x  = np.random.random(N)
    q  = f(x)  # first momentum
    q2 = q*q   # second momentum
    
    mean = np.sum(q) / float(N) # compute mean explicitly, not using np.mean
    var  = np.sum(q2) / float(N) - mean * mean # variance as E[X^2] - E[X]^2
    sd   = np.sqrt(var) # std.deviation
    
    print(mean) # should be 1/2
    print(var)  # should be 1/12
    print(sd)   # should be 0.5/sqrt(3)
    print("-----------------------------------------------------")
    
    sigma = sd / np.sqrt(float(N)) # assuming CLT, error estimation in original units
    
    print("result = {0} with error +- {1}".format(mean, sigma))
    
    err_pct = sigma / mean * 100.0 # error estimate in percents
    
    print("result = {0} with error +- {1}%".format(mean, err_pct))
    

    Be aware, that we computed one sigma error and (even not talking about it being random value itself) true result is within printed mean+-error only for 68% of the runs. You could print mean+-2*error, and it would mean true result is inside that region for 95% cases, mean+-3*error true result is inside that region for 99.7% of the runs and so on and so forth.

    UPDATE

    For sampling variance estimate, there is known problem called Bias in the estimator. Basically, we underestimate a bit sampling variance, proper correction (Bessel's correction) shall be applied

    var  = np.sum(q2) / float(N) - mean * mean # variance as E[X^2] - E[X]^2
    var *= float(N)/float(N-1)
    

    In many cases (and many examples) it is omitted because N is very large, which makes correction pretty much invisible - f.e., if you have statistical error 1% but N is in millions, correction is of no practical use.