Search code examples
pythonscipylarge-data

Log-computations in Python


I'm looking to compute something like:

formula

Where f(i) is a function that returns a real number in [-1,1] for any i in {1,2,...,5000}.

Obviously, the result of the sum is somewhere in [-1,1], but when I can't seem to be able to compute it in Python using straight forward coding, as 0.55000 becomes 0 and comb(5000,2000) becomes inf, which result in the computed sum turning into NaN.

The required solution is to use log on both sides.

That is using the identity a × b = 2log(a) + log(b), if I could compute log(a) and log(b) I could compute the sum, even if a is big and b is almost 0.

So I guess what I'm asking is if there's an easy way of computing

log2(scipy.misc.comb(5000,2000))

So I could compute my sum simply by

sum([2**(log2comb(5000,i)-5000) * f(i) for i in range(1,5000) ])

@abarnert's solution, while working for the 5000 figure addresses the problem by increasing the precision in which the comb is computed. This works for this example, but doesn't scale, as the memory required would significantly increase if instead of 5000 we had 1e7 for example.

Currently, I'm using a workaround which is ugly, but keeps memory consumption low:

log2(comb(5000,2000)) = sum([log2 (x) for x in 1:5000])-sum([log2 (x) for x in 1:2000])-sum([log2 (x) for x in 1:3000])

Is there a way of doing so in a readable expression?


Solution

  • The sum

    formula

    is the expectation of f with respect to a binomial distribution with n = 5000 and p = 0.5.

    You can compute this with scipy.stats.binom.expect:

    import scipy.stats as stats
    
    def f(i):
        return i
    n, p = 5000, 0.5
    print(stats.binom.expect(f, (n, p), lb=0, ub=n))
    # 2499.99999997
    

    Also note that as n goes to infinity, with p fixed, the binomial distribution approaches the normal distribution with mean np and variance np*(1-p). Therefore, for large n you can instead compute:

    import math
    print(stats.norm.expect(f, loc=n*p, scale=math.sqrt((n*p*(1-p))), lb=0, ub=n))
    # 2500.0