I'm trying to calculate the likelihood of successfully guessing a password in Python. For example, if we take a 10 character lowercase password (26**10 possible passwords) and we can make 1 billion guesses a second, we can calculate the probability of successfully guessing the password in one hour with:
from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**10))
Which gives us the the result of 0.02525515384826793
(i.e, 2.5%). However, this doesn't work as we increase the length of the password (or more strictly, as p
gets closer to zero). For instance if we increase the length of the password to 12 characters:
from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**12))
Then the returned value is just 0.0
, which is incorrect - presumably due to the float rounding down to zero at some point. How can I calculate this to get a more precise answer?
Edit: this was with SciPy 1.10.1. Testing on the latest version (1.11.2 at time of writing) gave the correct value - so it looks like this was an issue with the older version.
I think this is about the limitations of floating-point arithmetic which I also often face (i.e, float overflow).
you can use the mpmath library, which provides arbitrary-precision arithmetic:
from mpmath import mp
mp.dps = 50 # Set the desired precision (adjust as needed)
total_passwords = 26**12
guesses_per_second = 1000000000
seconds_in_an_hour = 3600
p = mp.mpf(1) / mp.mpf(total_passwords)
n = mp.mpf(guesses_per_second) * mp.mpf(seconds_in_an_hour)
probability = 1 - mp.power(1 - p, n)
print(float(probability))
Extra-Note: Floating-point numbers lose precision even when you are just working with such seemingly harmless numbers like 0.2 or 76.5. You should be extra careful when working with a large amount of floating-point operations over the same data as errors may build up rather quickly.
I hope this can help somehow.