Search code examples
pythonnumpydivide-by-zero

"Divide by zero encountered in log" with very large arguments


I am trying to evaluate this function:

import scipy as sci
import numpy as np

np.pi*sci.special.gamma(2*i-1)/((2**(2*(i-1)))*(sci.special.gamma(i)**2))

which is running into trouble with "divide by zero", so I split it up in parts using a log to see, where the divide by zero appears.

The approach looks then in pseudo code:

log(f) = log(pi)+log(gamma(...))-log(2**...)-log(gamma(...))

The problem appears in the third part.

I want to solve this term for a range of i, so interestingly, this appears, when trying a loop:

N = np.arange(2,21)
for i in N:
    f = np.log(2**(2*(i-1)))
    print(i, f)

Out:
    2 1.3862943611198906
    3 2.772588722239781
    4 4.1588830833596715
    5 5.545177444479562
    6 6.931471805599453
    7 8.317766166719343
    8 9.704060527839234
    9 11.090354888959125
    10 12.476649250079015
    11 13.862943611198906
    12 15.249237972318797
    13 16.635532333438686
    14 18.021826694558577
    15 19.408121055678468
    16 20.79441541679836
    17 -inf
    18 -inf
    19 -inf
    20 -inf
    __main__:2: RuntimeWarning: divide by zero encountered in log

So it seems, that the problem occurs, when the argument becomes too large (i=17).

On the other hand, when I just calculate the single value for i=17, it works (and also for even larger i):

np.log(2**(2*(17-1)))
Out: 22.18070977791825

Where is the division by zero, and how can I generally overcome this problem when reaching to large arguments?


Solution

  • The main problem comes from an integer overflow due to the implicit type of np.arange. Indeed, np.arange(2,21) appears to result in an array of int32-typed values, and so i is of type int32 too, and so 2**(2*(i-1)) is. 2**(2*(i-1)) with i>=17 is greater than the limit of 2**31 (of the int32 type). The overflow cause the value to be truncated to 0, hence the Numpy error when the log is computed (since the log function is not defined for the input 0).

    One simple solution is to change the type to int64 values, but this is not great as the output exponent quickly grow. Using floats is a bit better. However, this is far from being a great idea since the maximum exponent will be quickly reached with i values close to 500 or above. The root of the issue is the numerical stability of the computation.

    A good solution is to fix the numerical stability issue by using mathematical transformations. Indeed, log2(2**n) = n and log(n) = log2(n)*log(2). Thus, log(2**(2*(i-1))) = log2(2**(2*(i-1)))*log(2) = (2*(i-1))*log(2). As a result the following code should give equivalent results but without numerical issues:

    N = np.arange(2,21)
    log2 = np.log(2)
    for i in N:
        f = (2*(i-1))*log2
        print(i, f)
    

    Note that this code is actually faster to compute and simpler. Note also that there is no problem with huge i values like the one bigger than 1,000,000 with this method! Here is the result:

    2 1.3862943611198906
    3 2.772588722239781
    4 4.1588830833596715
    5 5.545177444479562
    6 6.931471805599453
    7 8.317766166719343
    8 9.704060527839234
    9 11.090354888959125
    10 12.476649250079015
    11 13.862943611198906
    12 15.249237972318797
    13 16.635532333438686
    14 18.021826694558577
    15 19.408121055678468
    16 20.79441541679836
    17 22.18070977791825
    18 23.56700413903814
    19 24.95329850015803
    20 26.33959286127792