Search code examples
pythonscipysympy

Use of Sympy and Scipy's dblquad leads to numerical integration failing to converge


Introduction

The Python Scipy library provides the dblquad function, which allows one to calculate 2-dimensional integrals numerically. The Python Sympy library provides tools to define and evaluate symbolic expressions in Python.

When combining the dblquad function with Sympy expressions, undesired behavior as described below occurs.

My library versions and setup

  • Python 3.9.13,
  • Scipy 1.13.1
  • Sympy 1.12.1.

All of the code below is assumed to be prefixed with the imports

import numpy as np

from scipy.integrate import dblquad
from sympy.functions.special.beta_functions import beta

Problem statement

When running the following code:

function = lambda x, y: beta(2, 2) if 0 < x < 1 and 0 < y < 1 else 0

result = dblquad(lambda x, y: function(y, x),
    0.99 / 2, 1,  # boundaries for x
    lambda x: 0.99 / x, lambda x: np.inf  # boundaries for y
)

the program doesn't terminate (at least not in any reasonable amount of time for which I ran the code). This is surprising to me for the reasons listed below. Does someone have an explanation?

When replacing Sympy's beta(2, 2) with its numeric value 1/6, the program runs almost instantly

If instead of the above code we run

function = lambda x, y: 1 / 6 if 0 < x < 1 and 0 < y < 1 else 0

result = dblquad(lambda x, y: function(y, x),
    0.99 / 2, 1,  # boundaries for x
    lambda x: 0.99 / x, lambda x: np.inf  # boundaries for y
)

(i.e. we just replace beta(2, 2) by the numeric value 1/6), the program terminates with a result very fast.

When integrating over the whole first quadrant, the program works

If instead of the initial code we run

function = lambda x, y: beta(2, 2) if 0 < x < 1 and 0 < y < 1 else 0

result = dblquad(lambda x, y: function(y, x),
    0, np.inf,  # boundaries for x
    0, np.inf  # boundaries for y
)

the code runs very fast.

Final question

Why does the above behavior occur and how can I make the initial program run. (In the application I have in mind, I am in need of more complicated behavior from Sympy which I cannot replace as easily with something else as I can replace beta(2, 2) by 1/6).


Solution

  • Although the documentation of dblquad is not explicit about the return type of the integrand, it is reasonable to assume that it must be numeric.

    type(beta(2, 2))  # beta
    

    Doesn't look numeric.

    import numpy as np
    np.issubdtype(float, np.number)  # True
    np.issubdtype(np.float64, np.number)  # True
    np.issubdtype(type(beta(2, 2)), np.number)  # False
    

    It isn't - not according to NumPy's definition of a number, anyway. And SciPy has strong ties to NumPy, so NumPy's definition of a number is a reasonable one to use.

    So I'd suggest that it's more surprising that the code ever works than that it doesn't always work.

    Now, if it ever does work, it's probably because at some point this beta object it gets coerced to a float; e.g., something like:

    float(beta(2, 2))  # 0.16666666666666666
    

    happens at some point.

    When replacing Sympy's beta(2, 2) with its numeric value 1/6, the program runs almost instantly

    Evaluating evaluating 1/6 is much faster than evaluating beta(2, 2) using SymPy and then converting to a float.

    %timeit 1/6
    # 15.3 ns ± 3.47 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
    %timeit float(beta(2, 2))
    # 163 µs ± 31.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    So it's reasonable to expect the integral will take 10000 times longer to run, if it does run.

    When integrating over the whole first quadrant, the program works

    On Colab, both work, but the whole first quadrant is a much simpler problem, so it executes much faster.

    %timeit dblquad(lambda x, y: function(y, x), 0, np.inf, 0, np.inf)
    # 78.7 ms ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    %timeit dblquad(lambda x, y: function(y, x), 0.99 / 2, 1, lambda x: 0.99 / x, lambda x: np.inf)
    # 9.92 s ± 438 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    Integrating over the whole first quadrant is much faster even if you use 1/6 instead of beta(2, 2).

    how can I make the initial program run

    There's a good comment about lambdify:

    import numpy as np
    
    from scipy.integrate import dblquad
    from sympy.functions.special.beta_functions import beta
    
    from sympy import lambdify
    from sympy.abc import x, y, z
    
    # first argument: a sequence of symbolic arguments
    # second argument: symbolic expression
    # output: function that accepts numbers and returns numbers
    my_beta = lambdify((x, y), beta(2, 2))  # the expression `beta(2, 2)` doesn't use `x` or `y`, but it could be replaced with an expression that does
    

    Now we can pass numerical arguments to my_beta, and it will return beta(2, 2) as a number:

    my_beta(1, 10)  # 0.16666666666666666
    %timeit my_beta(1, 10)
    # 1.98 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    So you can use it in your code like:

    function = lambda x, y: my_beta(x, y) if 0 < x < 1 and 0 < y < 1 else 0
    
    result = dblquad(lambda x, y: function(y, x),
        0.99 / 2, 1,  # boundaries for x
        lambda x: 0.99 / x, lambda x: np.inf  # boundaries for y
    )
    

    Another option would be to use write your expression directly using scipy special functions. For example, use scipy.special.beta to evaluate the beta function.

    from scipy.special import beta
    function = lambda x, y: beta(2, 2) if 0 < x < 1 and 0 < y < 1 else 0
    result = dblquad(lambda x, y: function(y, x),
        0.99 / 2, 1,  # boundaries for x
        lambda x: 0.99 / x, lambda x: np.inf  # boundaries for y
    )