java algorithm math cryptography bouncycastle

Check my math: The bouncycastle issue: Odds 2 non-equal passwords considered equal

This is the 'check if hashes are equal' code of BouncyCastle's v1.66 release of their OpenBSD BCrypt implementation:

for (int i = 0; i != sLength; i++){
  isEqual &= (bcryptString.indexOf(i) == newBcryptString.indexOf(i));
}

where sLength is guaranteed 60 (see line 268), and bcryptString is a full openbsd-style bcrypt string, such as for example $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a.

The error is the used method: They meant to use charAt.

The intent of this loop is to check, for each position from 0 to 59, if the character at position i in a is the same character at position i in b.

But, due to erroneous use of indexOf(int), instead, this checks if the position of the first character with unicode i in a matches the position of the first character with unicode i in b, with 'not in string' matching 'not in string'.

Example: "Hello".indexOf(101) returns 1 (java is 0-based, 101 is the unicode of e, and e is the second character). "Hello".indexOf(0) returns -1 (because there is no unicode-0 in "Hello").

I'm trying to do the math to figure out the answer to following question:

If you try to log in for a given user by using an arbitrarily chosen password instead of the actual password (and let's posit that the odds that you so happen to choose the exact password are zero), what are the odds that this algorithm erroneously considers the arbitrarily chosen password as 'equal'?

Construction of the openbsd string

As far as I can tell, this: $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a

breaks down as follows:

$2y$ - constant - it's a marker that means 'bcrypt string', pretty much.
10 - constant per server - number of rounds used; it's 10 almost everywhere, and will always have the same value for any given user's passhash.
$ - constant, again.
.vGA1O9wmRjrwAVXD98HNO (22 characters): The 16-byte salt padded with 2 zero bytes, then base-64ed, and then chuck out the last 2 characters. This can be used to reconstruct the salt.
the rest (31 characters): The result of bcrypt(rounds, concat(salt+utf8encode(pass))), base64-encoded, and toss out the last character.
Note that the base64 impl uses all lowercase letters, all uppercase letters, all digits, the dot, and the slash.

Basic realizations about the odds

The faulty algorithm will end up checking if the position of the first occurrence of all characters in unicode range 0 to 59 (inclusive) is the same for both hashes ("realhash" and "inhash"). If all 60 have the same 'pos of first occurrence' for both realhash and inhash, the algorithm considers the passwords equal.

Of all unicode symbols between 0 and 59, the only ones that could even be in this string are 0123456789$./.

However, of those, the $012 are irrelevant: For any passhash.indexOf('$'), the answer is 0. For any passhash.indexOf('1'), the answer is 4. Same goes for 0 and 2. That leaves just 9 characters that could possibly result in the algorithm saying "inhash" is not equal to "realhash": 3456789./.

To figure out what the odds are, we need for each of these 9 characters to not end up being a differentiator. To figure out what the odds are that one specific character (let's say the '3') fails to differentiate, then that's just 1-Z with Z being the odds that the '3' is enough to differentiate.

Z = Q * (U*A + (1-U)*B)
Q = '3' is not in the salt section of realhash. (first 22)
U = '3' is in the pass section of inhash. (last 31)
A = either '3' is not in the pass section of realhash, or it is, but its first occurrence is not in the same place as the first occurrence of '3' in inhash.
B = '3' is in the pass section of realhash.
A = 1- (V*W); V = '3' is in the pass section of realhash, W = provided '3' is in both realhash and passhash, its first occurrence is the same place. Once Z has been determined, the odds that my arbitrary password results in this algorithm thinking it is correct, even when it isn't, is then defined by: Neither the '3', nor any of the other 8 characters was sufficient to differentiate. Thus: (1-Z)^9.

Z = Q * ( (U * (1 - (V * W))) + ((1-U) * (1-V)) )


Q = (63/64)^22   ~= 0.707184
U = 1-(63/64)^31 ~= 0.386269
V = 1-(63/64)^31 ~= 0.386269
W = 1/31         ~= 0.032258
(1 - (V * W))    ~= 0.987539
(1-U) * (1-V)    ~= 0.376666
Z                ~= 0.536131

Chance that the 3 fails to differentiate:
(1-Z)            ~= 0.463868

Chance that all of 3456789./ fail:
(1-Z)^9          ~= 0.00099438

Thus, roughly 0.001: about one in a thousand odds that the algorithm says that 2 passwords are equal when they are not.

Am I missing anything significant?

NB: BouncyCastle's most recent public version has fixed this bug. CVE-2020-28052 tracks the problem).

Solution

For problems like this, a Monte Carlo simulation is a useful sanity check. The result I got from the simulation was 0.0044, about 4 times higher than the calculated result in the question. That seemed high to me, so I did some debugging to see where that result was coming from.

It turns out that the vast majority of the false matches are due to one very simple mechanism: the 22 character salt eliminates some of the characters-of-interest, and the remaining characters-of-interest do not appear in the rest of the hash.

As mentioned in the question, there are 9 characters-of-interest: 3456789./

If any of those appear in the salt, then the indexOf() for that character will match, and that character is no longer of interest. Monte Carlo shows that on average, 2.6 of the 9 characters appear in the salt, and are eliminated from consideration. That makes sense because the salt contains at most 22 of the base-64 characters, so about one third. Here's a sample run of the Monte Carlo simulation:

 0     35645 
 1    156228 
 2    283916 
 3    281018 
 4    166381 
 5     61024 
 6     13791 
 7      1850 
 8       139 
 9         3

The first column is the number of characters-of-interest that were eliminated by the salt. The second column is the number of times that occurred out of 1 million attempts. For example, in 3 out of a million attempts, the salt eliminated all 9 characters-of-interest, which then guarantees a false match.

In 139 out of a million attempts, the salt eliminated 8 of the 9 characters-of-interest. The remaining character then either needs to match in the two hash strings, or it needs to be absent from both hash strings. The odds that it will be absent are (63/64) ^ 62 = 0.377.

So we can augment the table of results like so (these are Monte Carlo results):

 0     35645    
 1    156228    
 2    283916    
 3    281018    
 4    166381    
 5     61024    
 6     13791    
 7      1850    
 8       139        53   0.386053
 9         3         3   1.000000

The second to last line can be interpreted as follows: 8 characters-of-interest were eliminated by the salt in 139 out of 1 million attempts. Of the 139, 53 (or 38.6%) resulted in a match because the single remaining character-of-interest did not appear in the last 31 characters of either hash string.

Here are the full results:

 0     35645         2   0.000070   0
 1    156228        39   0.000250   5
 2    283916       214   0.000757  25
 3    281018       628   0.002237  64
 4    166381      1056   0.006349  83
 5     61024      1114   0.018260  68
 6     13791       702   0.050936  32
 7      1850       265   0.143467   7
 8       139        53   0.386053   0
 9         3         3   1.000000   0

false matches due to elimination and absence:   4076
false matches due to elimination, and matching:  284
total false matches:                            4360 out of 1 million

The last column is the number of times one or more characters of interest matched indexes in the final 31 characters of the hash strings.

Mathematical Analysis of the Problem

There are two steps to analyzing the problem:

Compute the odds that the salt (22 characters) will eliminate characters of interest (koi¹).
Compute the odds that the characters in the tail of the hash (the last 31 characters) either match (at the first occurrence), or don't occur.

^{(1): I'm using "koi" as the abbreviation since the spell checker thinks it's a fish, and will leave it alone.}

The decision tree for the first step looks like this:

The column headers are the number of characters in the salt seen so far. The column footers are the divisor for that column. The row labels are the number of koi left.

Column 0 of the tree:
- 1/1 is the probability that 9 koi remain after 0 characters have been seen.
Column 1 of the tree:
- 55/64 is the probability that the first character of the salt is not a koi.
- 9/64 is the probability that the first character of the salt is a koi, leaving 8 koi.
Column 2 of the tree:
- 55*55/4096 is the probability that neither of the first two characters is a koi.
- 55*9/4096 is the probability of a non-koi followed by a koi, leaving 8 koi left.
- 9*56/4096 is the probability that the first character was a koi (leaving 8) followed by a non-koi.
- 9*8/4096 is the probability that the first two characters are both koi, leaving 7 koi.

The full decision tree has 23 columns (0 to 22 characters in the salt), and ten rows (9 to 0 koi remaining). The last column in the decision tree (shown below) gives the odds (out of a million) for that number of koi remaining after the salt has been checked.

The decision tree for the second step is a bit more complicated. At each of the 31 character positions, there are two characters to consider, the character in the real hash, and the character in the fake hash. Each is independently and randomly selected, so there are 64*64=4096 possibilities for each of the 31 character positions. There are four possible outcomes:

Neither character is a koi. The probability (64-k)/64 * (64-k)/64 where k is the number of koi left. The number of koi remaining is unchanged.
The character in the real hash is not a koi, but the character in the fake hash is a koi. The probability is (64-k)/64 * k/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash is not the exact same character. The probability is k/64 * 63/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash matches. The probability is k/64 * 1/64, and the number of koi is reduced by one.

The decision tree starting with 9 koi looks like this:

The biggest difference compared to the salt decision tree is the addition of the failure row. The matching can fail at any column. Once a failure occurs, that failure carries forward to the last column in the decision tree. For example, the 55*9 failure in the column 1 carries forward as 55*9 * 64*64 in the column 2 (because regardless of which two characters occur at the second position, the result is still failure). The probability that the fake hash will match the real hash for a given k (the "success" rate) is 1 - failures for that k.

Hence, for each value of k, we can multiply the odds for that k (from step 1) by the success rate for that k. The table below shows the results.

The first column is k, the number koi that remain after the salt.
The second column is the odds for that number of koi.
The third column is the number of matches that occur because none of the koi appear in the tail of either hash.
The fourth column is the number of matches that occur when one or more koi successfully match in the tail.

All values are out of one million.

9  35647    3  1
8 156071   40  6
7 283872  216 27
6 281006  628 63
5 166489 1074 85
4  61078 1117 67
3  13837  705 30
2   1861  260  7
1    134   51  1
0      4    4  0

Matches with no koi present in the tail of either hash: 4098
Matches where 1 or more koi match in the tail:           287
Total matches per million:                              4385