I've discovered a strange behavior when using scipy.integrate.quad. This behavior also shows up in Octave's quad function, which leads me to believe that it may have something to do with QUADPACK itself. Interestingly enough, using the exact same Octave code, this behavior does not show up in MATLAB.
On to the question. I'm numerically integrating a lognormal distribution over various bounds. For F is cdf of lognormal, a is lower bound and b is upper bound, I find that under some conditions,
integral(F, a, b) = 0 when b is a "very large number," while
integral(F, a, b) = the correct limit when b is np.inf. (or just Inf for Octave.)
Here's some example code to show it in action:
from __future__ import division
import numpy as np
import scipy.stats as stats
from scipy.integrate import quad
# Set up the probability space:
sigma = 0.1
mu = -0.5*(sigma**2) # To get E[X] = 1
N = 7
z = stats.lognormal(sigma, 0, np.exp(mu))
# Set up F for integration:
F = lambda x: x*z.pdf(x)
# An example that appears to work correctly:
a, b = 1.0, 10
quad(F, a, b)
# (0.5199388..., 5.0097567e-11)
# But if we push it higher, we get a value which drops to 0:
quad(F, 1.0, 1000)
# (1.54400e-11, 3.0699e-11)
# HOWEVER, if we shove np.inf in there, we get correct answer again:
quad(F, 1.0, np.inf)
# (0.5199388..., 3.00668e-09)
# If we play around we can see where it "breaks:"
quad(F, 1.0, 500) # Ok
quad(F, 1.0, 831) # Ok
quad(F, 1.0, 832) # Here we suddenly hit close to zero.
quad(F, 1.0, np.inf) # Ok again
What is going on here? Why does quad(F, 1.0, 500) evaluate to approximately the correct thing, but quad(F, 1.0, b) goes to zero for all values 832 <= b < np.inf?
While I'm not exactly familiar with QUADPACK, adaptive integration generally works by increasing resolution until the answer no longer improves. Your function is so close to 0 for most of the interval (with F(10)==9.356e-116
) that the improvement is negligible for the initial grid points that quad chooses, and it decides that the integral must be close to 0. Basically, if your data hides in a very narrow subinterval of the range of integration, quad
eventually won't be able to find it.
For integration from 0
to inf
, the interval obviously cannot be subdivided into a finite number of intervals, so quad
will need some preprocessing before computing the integral. For example, a change of variables like y=1/(1+x)
would map the interval 0..inf
to 0..1
. Subdividing that interval will sample more points near zero from the original function, enabling quad
to find your data.