I would like to know the meaning of the error as given by different Python integration routines, e.g. dblquad. Since we do not know the exact integration value, how is the estimate of error calculated? What is the reference? In some of my calculations I see that increasing the limit of integration takes the error to extremely high values. Since it is just an estimate of error, is it at all advisable to rely on such a result?
is it at all advisable to rely on such a result?
In most cases, yes. But if you think the integration routine is behaving strangely and you don't trust its output, do try to change the approach: for example, divide the region of integration into parts, integrate over each part separately and see if the results add up.
In numerical integration, one estimates the error by using two methods to compute the integral (or the same method with two step sizes), and considering the difference between the results. Deep within Fortran source of SciPy's quadpack routines we find
abserr = dabs((resk-resg)*hlgth)
where resg is the result of the 10-point Gauss formula, and resk is the result of the 21-point Kronrod formula. See the Wikipedia article Gauss–Kronrod quadrature formula for the mathematical meaning of these. (The hlgth
is half the length of the integral of integration; the length is here due to scaling.)
Actually, the formula I quoted is not the final error estimate, it's a very crude first approach to it. Two lines later we see
abserr = resasc*(0.2d+03*abserr/resasc)**1.5d+00)
which is exactly what the Wikipedia article states:
The recommended error estimate is (200*|gauss - kronrod|)1.5
This kind of estimate of the absolute error is not guaranteed to bound the difference between the computed and actual integral (the latter being unknown). The "recommended" estimate tends to work in practice, and there's some mathematical justification for the exponent 1.5 (the order of convergence of the method) but we never know if it really covers the actual error.
After all, the function was only evaluated at finitely many points within its domain. For all we know, it could happen to be 0 at those points and something huge elsewhere, at the points where the integration routine didn't look.
Here is an integral of a simple-looking function which is evaluated incorrectly:
from scipy.integrate import quad
import numpy as np
quad(lambda x: np.exp(-x**2), -1e2, 1e3)
returns (4.176612573788305e-60, 7.896357364711954e-60)
. The actual integral is about 1.77 (the square root of pi). The error estimate 8e-60 is completely wrong, as is the value 4e-60. The reason is that this function is localized near 0, and the interval of integration is [-100, 1000], which is much larger. The quad
algorithm did not happen to sample the function at any points with sizable values, so it proceeded on the idea that it's pretty much zero everywhere.