Search code examples
pythonalgorithmnumpyprobabilitybinomial-coefficients

Computing a binomial probability for huge numbers


I want to compute binomial probabilities on python. I tried to apply the formula:

probability = scipy.misc.comb(n,k)*(p**k)*((1-p)**(n-k))

Some of the probabilities I get are infinite. I checked some values for which p=inf. For one of them, n=450,000 and k=17. This value must be greater than 1e302 which is the maximum value handled by floats.

I then tried to use sum(np.random.binomial(n,p,numberOfTrials)==valueOfInterest)/numberOfTrials

This draws numberOfTrials samples and computes the average number of times the value valueOfInterest is drawn.

This doesn't raise any infinite value. However, is this a valid way to proceed? And why this way wouldn't raise any infinite value whereas computing the probabilities does?


Solution

  • Work in the log domain to compute combination and exponentiation functions and then raise them to exponent.

    Something like this:

    combination_num = range(k+1, n+1)
    combination_den = range(1, n-k+1)
    combination_log = np.log(combination_num).sum() - np.log(combination_den).sum()
    p_k_log = k * np.log(p)
    neg_p_K_log = (n - k) * np.log(1 - p)
    p_log = combination_log + p_k_log + neg_p_K_log
    probability = np.exp(p_log)
    

    Gets rid of numeric underflow/overflow because of large numbers. On your example with n=450000 and p = 0.5, k = 17, it returns p_log = -311728.4, i. e., the log of final probability is pretty small and hence underflow occurs while taking np.exp. However, you can still work with log probability.