sorry if this is a weird question.
I was actually curious about timing attacks, so I have done a little research and understood the concept. I understood that, code like:
if token == password:
print('Welcome')
else:
print('Wrong password')
is equivalent to:
def equal(s1, s2):
if len(s1) != len(s2):
return False
for i in range(len(s1)):
if s1[i] != s2[i]:
return False
return True
PS - I am using python 3.9.2
So I have crafted a vulnerable code which looks like this:-
f = open('pass.txt', 'r')
password = f.read()
f.close()
def equal(s1, s2):
if len(s1) != len(s2):
return False
for i in range(len(s1)):
if s1[i] != s2[i]:
return False
return True
def login(upass):
if equal(upass, password):
print('Login successful')
else:
print('Login failed')
login()
This simple program will compare the user's given password (via the upass
parameter) with the password stored in the file pass.txt in the same directory. If the password matches then it will greet the user with a welcome message, otherwise, it will alert the user that the login has failed.
Assumptions:-
I was able to exploit the password by using the below method:-
def attack():
leaked = ''
for i in range(4):
result = { letter : 0 for letter in ascii_uppercase }
for _ in range(50000):
for letter in ascii_uppercase:
string = leaked + letter + '.' * ( 4 - len(leaked) - len(letter) )
start = time_ns()
login(string)
end = time_ns()
result[letter] += end - start
leaked += sorted(result.items(), key = lambda item : item[1], reverse=True)[0][0]
print(leaked)
I got the output as TEST
which is correct. However, you can clearly see that I am not using ==
for string comparasion, in fact I am using its equivalent method. So I decided to switch back to ==
and check if my exploit is working. So I modified the equal()
method to this:-
def equal(s1, s2):
# if len(s1) != len(s2):
# return False
# for i in range(len(s1)):
# if s1[i] != s2[i]:
# return False
# return True
if s1 == s2:
return True
else:
return False
So using this code, when I called the attack
method, to my surprise it gave me pretty weird results. I was given the following output when I ran it multiple times: AOAD
, BVCB
& LGAZ
. This is clearly NOT the password stored in the pass.txt file.
So my question is, is ==
not vulnerable to timing attacks?
TL;DR yes it's vulnerable! However, you should still use ==
for comparison because that's the best thing there is.
Whether or not the implementation of str.__eq__()
is vulnerable to timing attacks is easy to verify. Let's define four strings like so:
import random
# Lots of random characters from A to Z
s1 = ''.join(chr(random.randint(65, 90)) for _ in range(1000000))
s1c = s1 # This string is equal and at the same memory location
s2 = ''.join(c for c in s1) # This string is equal but not at the same memory loc
s3 = s1[:-1] + "?" # This is not equal because of a mismatch at the end
s4 = "?" + s1[1:] # This is not equal because of a mismatch at the start
s5 = s1[:-1000] # This is not equal because of mismatched lengths
To time the equality check, we can use the timeit
module.
import timeit
t1_1c = timeit.timeit('s1 == s1c', 'from __main__ import s1, s1c', number=10000)
t1_2 = timeit.timeit('s1 == s2', 'from __main__ import s1, s2', number=10000)
t1_3 = timeit.timeit('s1 == s3', 'from __main__ import s1, s3', number=10000)
t1_4 = timeit.timeit('s1 == s4', 'from __main__ import s1, s4', number=10000)
t1_5 = timeit.timeit('s1 == s5', 'from __main__ import s1, s5', number=10000)
I get the following numbers:
Variable | Value |
---|---|
t1_1c |
0.0003349999997226405 |
t1_2 |
0.7978945999993812 |
t1_3 |
0.7638719000005949 |
t1_4 |
0.0011733000001186156 |
t1_5 |
0.0003372000001036213 |
Clearly, the strings at the same memory location report that they're equal almost instantly, but we don't expect that to be the case in a realistic situation.
The strings that have an error at the start take orders of magnitude less time to report "not equal" than those that have an error at the end, so I don't think your findings are broadly applicable. This could be a version / OS issue, or maybe TEST
is too short a string to really notice any of these problems.
Maybe changing the location of the mismatch will provide some insight? Such a long string seems overkill, so I'm going to reduce its size by an order of magnitide
s1 = ''.join(chr(random.randint(65, 90)) for _ in range(100000))
timings = []
for i in range(len(s1)):
# Force a mismatch at index i
s_temp = s1[0:i] + "?" + s1[i+1:]
tm = timeit.timeit('s1 == s_temp', 'from __main__ import s1, s_temp', number=100)
print(f"\r{i/len(s1)*100:.2f}".ljust(20, " "), end="")
timings.append(tm)
Plotting this against the location of the mismatch gives the following (definitely not constant) plot:
The red point is when the strings are equal (no mismatch). It's evident that the further down the string the mismatch is, the longer the equality check takes. If we attribute the spread to the fact that my computer is working on other things too, and only look at the lower edge of this shape, it looks fairly linear (y-axis is logarithmic, linear axis here if you want), so it would lend some weight to the argument that the str.__eq__()
method operates in linear time depending on how many characters it needs to check.
To wrap up,
==
or str.__eq__()
method is not safe from timing attacks. Your password "TEST"
was just too small to see the effect of comparison time.==
for string comparison because that is the correct way to check string equality.