This is the 'check if hashes are equal' code of BouncyCastle's v1.66 release of their OpenBSD BCrypt implementation:
for (int i = 0; i != sLength; i++){
isEqual &= (bcryptString.indexOf(i) == newBcryptString.indexOf(i));
}
where sLength
is guaranteed 60 (see line 268), and bcryptString is a full openbsd-style bcrypt string, such as for example $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
.
The error is the used method: They meant to use charAt
.
The intent of this loop is to check, for each position from 0 to 59, if the character at position i
in a
is the same character at position i
in b
.
But, due to erroneous use of indexOf(int)
, instead, this checks if the position of the first character with unicode i
in a
matches the position of the first character with unicode i
in b
, with 'not in string' matching 'not in string'.
Example: "Hello".indexOf(101)
returns 1
(java is 0-based, 101 is the unicode of e
, and e
is the second character). "Hello".indexOf(0)
returns -1
(because there is no unicode-0 in "Hello").
I'm trying to do the math to figure out the answer to following question:
If you try to log in for a given user by using an arbitrarily chosen password instead of the actual password (and let's posit that the odds that you so happen to choose the exact password are zero), what are the odds that this algorithm erroneously considers the arbitrarily chosen password as 'equal'?
As far as I can tell, this: $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
breaks down as follows:
$2y$
- constant - it's a marker that means 'bcrypt string', pretty much.10
- constant per server - number of rounds used; it's 10 almost everywhere, and will always have the same value for any given user's passhash.$
- constant, again..vGA1O9wmRjrwAVXD98HNO
(22 characters): The 16-byte salt padded with 2 zero bytes, then base-64ed, and then chuck out the last 2 characters. This can be used to reconstruct the salt.bcrypt(rounds, concat(salt+utf8encode(pass)))
, base64-encoded, and toss out the last character.The faulty algorithm will end up checking if the position of the first occurrence of all characters in unicode range 0 to 59 (inclusive) is the same for both hashes ("realhash" and "inhash"). If all 60 have the same 'pos of first occurrence' for both realhash and inhash, the algorithm considers the passwords equal.
Of all unicode symbols between 0 and 59, the only ones that could even be in this string are 0123456789$./
.
However, of those, the $012
are irrelevant: For any passhash.indexOf('$')
, the answer is 0. For any passhash.indexOf('1')
, the answer is 4. Same goes for 0 and 2. That leaves just 9 characters that could possibly result in the algorithm saying "inhash" is not equal to "realhash": 3456789./
.
To figure out what the odds are, we need for each of these 9 characters to not end up being a differentiator. To figure out what the odds are that one specific character (let's say the '3') fails to differentiate, then that's just 1-Z
with Z being the odds that the '3' is enough to differentiate.
(1-Z)^9
.Z = Q * ( (U * (1 - (V * W))) + ((1-U) * (1-V)) )
Q = (63/64)^22 ~= 0.707184
U = 1-(63/64)^31 ~= 0.386269
V = 1-(63/64)^31 ~= 0.386269
W = 1/31 ~= 0.032258
(1 - (V * W)) ~= 0.987539
(1-U) * (1-V) ~= 0.376666
Z ~= 0.536131
Chance that the 3 fails to differentiate:
(1-Z) ~= 0.463868
Chance that all of 3456789./ fail:
(1-Z)^9 ~= 0.00099438
Thus, roughly 0.001: about one in a thousand odds that the algorithm says that 2 passwords are equal when they are not.
Am I missing anything significant?
NB: BouncyCastle's most recent public version has fixed this bug. CVE-2020-28052 tracks the problem).
For problems like this, a Monte Carlo simulation is a useful sanity check. The result I got from the simulation was 0.0044, about 4 times higher than the calculated result in the question. That seemed high to me, so I did some debugging to see where that result was coming from.
It turns out that the vast majority of the false matches are due to one very simple mechanism: the 22 character salt eliminates some of the characters-of-interest, and the remaining characters-of-interest do not appear in the rest of the hash.
As mentioned in the question, there are 9 characters-of-interest: 3456789./
If any of those appear in the salt, then the indexOf()
for that character will match, and that character is no longer of interest. Monte Carlo shows that on average, 2.6 of the 9 characters appear in the salt, and are eliminated from consideration. That makes sense because the salt contains at most 22 of the base-64 characters, so about one third. Here's a sample run of the Monte Carlo simulation:
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139
9 3
The first column is the number of characters-of-interest that were eliminated by the salt. The second column is the number of times that occurred out of 1 million attempts. For example, in 3 out of a million attempts, the salt eliminated all 9 characters-of-interest, which then guarantees a false match.
In 139 out of a million attempts, the salt eliminated 8 of the 9 characters-of-interest. The remaining character then either needs to match in the two hash strings, or it needs to be absent from both hash strings. The odds that it will be absent are (63/64) ^ 62 = 0.377
.
So we can augment the table of results like so (these are Monte Carlo results):
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139 53 0.386053
9 3 3 1.000000
The second to last line can be interpreted as follows: 8 characters-of-interest were eliminated by the salt in 139 out of 1 million attempts. Of the 139, 53 (or 38.6%) resulted in a match because the single remaining character-of-interest did not appear in the last 31 characters of either hash string.
Here are the full results:
0 35645 2 0.000070 0
1 156228 39 0.000250 5
2 283916 214 0.000757 25
3 281018 628 0.002237 64
4 166381 1056 0.006349 83
5 61024 1114 0.018260 68
6 13791 702 0.050936 32
7 1850 265 0.143467 7
8 139 53 0.386053 0
9 3 3 1.000000 0
false matches due to elimination and absence: 4076
false matches due to elimination, and matching: 284
total false matches: 4360 out of 1 million
The last column is the number of times one or more characters of interest matched indexes in the final 31 characters of the hash strings.
There are two steps to analyzing the problem:
(1): I'm using "koi" as the abbreviation since the spell checker thinks it's a fish, and will leave it alone.
The decision tree for the first step looks like this:
The column headers are the number of characters in the salt seen so far. The column footers are the divisor for that column. The row labels are the number of koi left.
Column 0 of the tree:
1/1
is the probability that 9 koi remain after 0 characters have been seen.Column 1 of the tree:
55/64
is the probability that the first character of the salt is not a koi.9/64
is the probability that the first character of the salt is a koi, leaving 8 koi.Column 2 of the tree:
55*55/4096
is the probability that neither of the first two characters is a koi.55*9/4096
is the probability of a non-koi followed by a koi, leaving 8 koi left.9*56/4096
is the probability that the first character was a koi (leaving 8) followed by a non-koi.9*8/4096
is the probability that the first two characters are both koi, leaving 7 koi.The full decision tree has 23 columns (0 to 22 characters in the salt), and ten rows (9 to 0 koi remaining). The last column in the decision tree (shown below) gives the odds (out of a million) for that number of koi remaining after the salt has been checked.
9 35647
8 156071
7 283872
6 281006
5 166489
4 61078
3 13837
2 1861
1 134
0 4
The decision tree for the second step is a bit more complicated. At each of the 31 character positions, there are two characters to consider, the character in the real hash, and the character in the fake hash. Each is independently and randomly selected, so there are 64*64=4096
possibilities for each of the 31 character positions. There are four possible outcomes:
(64-k)/64 * (64-k)/64
where k
is the number of koi left. The number of koi remaining is unchanged.(64-k)/64 * k/64
, and the outcome is failure.k/64 * 63/64
, and the outcome is failure.k/64 * 1/64
, and the number of koi is reduced by one.The decision tree starting with 9 koi looks like this:
The biggest difference compared to the salt decision tree is the addition of the failure row. The matching can fail at any column. Once a failure occurs, that failure carries forward to the last column in the decision tree. For example, the 55*9
failure in the column 1 carries forward as 55*9 * 64*64
in the column 2 (because regardless of which two characters occur at the second position, the result is still failure). The probability that the fake hash will match the real hash for a given k
(the "success" rate) is 1 - failures
for that k
.
Hence, for each value of k
, we can multiply the odds for that k
(from step 1) by the success rate for that k
. The table below shows the results.
k
, the number koi that remain after the salt.All values are out of one million.
9 35647 3 1
8 156071 40 6
7 283872 216 27
6 281006 628 63
5 166489 1074 85
4 61078 1117 67
3 13837 705 30
2 1861 260 7
1 134 51 1
0 4 4 0
Matches with no koi present in the tail of either hash: 4098
Matches where 1 or more koi match in the tail: 287
Total matches per million: 4385