Search code examples
pythonscipy

How to calculate binomial probabilities in Python with very small numbers?


I'm trying to calculate the likelihood of successfully guessing a password in Python. For example, if we take a 10 character lowercase password (26**10 possible passwords) and we can make 1 billion guesses a second, we can calculate the probability of successfully guessing the password in one hour with:

from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**10))

Which gives us the the result of 0.02525515384826793 (i.e, 2.5%). However, this doesn't work as we increase the length of the password (or more strictly, as p gets closer to zero). For instance if we increase the length of the password to 12 characters:

from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**12))

Then the returned value is just 0.0, which is incorrect - presumably due to the float rounding down to zero at some point. How can I calculate this to get a more precise answer?


Edit: this was with SciPy 1.10.1. Testing on the latest version (1.11.2 at time of writing) gave the correct value - so it looks like this was an issue with the older version.


Solution

  • I think this is about the limitations of floating-point arithmetic which I also often face (i.e, float overflow).

    you can use the mpmath library, which provides arbitrary-precision arithmetic:

    from mpmath import mp
    mp.dps = 50  # Set the desired precision (adjust as needed)
    total_passwords = 26**12
    guesses_per_second = 1000000000
    seconds_in_an_hour = 3600
    
    p = mp.mpf(1) / mp.mpf(total_passwords)
    n = mp.mpf(guesses_per_second) * mp.mpf(seconds_in_an_hour)
    
    probability = 1 - mp.power(1 - p, n)
    print(float(probability))
    

    Extra-Note: Floating-point numbers lose precision even when you are just working with such seemingly harmless numbers like 0.2 or 76.5. You should be extra careful when working with a large amount of floating-point operations over the same data as errors may build up rather quickly.

    I hope this can help somehow.