Search code examples
pythonrnumerical-integration

The functions "integrate.quad" in python and "integral",'integrate' in R give erroneous results


I need to evaluate several integrals, and I am using the normal(0,1) density to test out.

In python

import scipy.integrate as integrate
import scipy.stats
import numpy as np

def integrand(x):
    return scipy.stats.norm(0, 1).pdf(x)

print integrate.quad(integrand, -float('inf'), 0)
print integrate.quad(integrand,-np.inf,100)

(0.4999999999999999, 5.089095674629994e-09)

(0.0, 0.0)

I was very puzzled by the fact that the computer calculated the integral correctly for the range (-inf,0), but completely missed (-inf,100) (which should be close to 1). Therefore, I tried the following in R

integrate(dnorm,-Inf,0)

0.5 with absolute error < 4.7e-05

integrate(dnorm,-Inf,100,abs.tol=0L)

0 with absolute error < 0

library(pracma)
integral(dnorm,-Inf,0)

[1] 0.5

integral(dnorm,-Inf,100,abstol=0)

[1] 0

What on earth is going on? What adaptive method should I use?


Solution

  • Looking up the QAGI and QAGS algorithms, it appears that the following is happening:

    1. The domain xϵ(-inf,b] is being mapped from t by the transformation x=b-(1-t)/t so that the integral can be evaluated over tϵ(0,1]. See the specs here.

    2. The adaptive quadrature algorithm is being used to evaluate the integral. Passing limit=1 into your scipy code produces the message "The maximum number of subdivisions (1) has been achieved." Passing limit=2 does not produce this message. This suggests that in Step 4 of the algorithm, the estimate of the integral, Q, and the estimate of the error, ε, are equal.

    3. This presumably happens because the there are no significant points used in the estimate. Using 21 evenly spaced points in the interval for tϵ(0,1] yields x-values that range from 80-100 (approximately). All of these values are very close to 0. The values used in the algorithm are not evenly spaced per this page, but presumably a similar result is achieved.

    So, in summary, the mapping from (-inf,100] to (0,1] is skewing the values sampled in the estimate for the integral toward the end-point of x=100. Since the normal distribution pdf is effectively zero here, the algorithm doesn't know that it's missing the area near x=0 where the distribution is non-zero, so it never sub-divides to improve accuracy.

    Also, scipy and R use the same algorithm, so it makes sense that they produce the same results.

    If you integrate from -100 to 100, the midpoint 0 will be an evaluation point, which allows the algorithm to function as intended. But, the if you integrate from -1000 to 100, the algorithm misses any significant points again and you end up with an integral of 0.