Should I use == for string comparison?

sorry if this is a weird question.

I was actually curious about timing attacks, so I have done a little research and understood the concept. I understood that, code like:

if token == password:
    print('Welcome')
else:
    print('Wrong password')

is equivalent to:

def equal(s1, s2):
    if len(s1) != len(s2):
        return False

    for i in range(len(s1)):
        if s1[i] != s2[i]:
            return False
    return True

PS - I am using python 3.9.2

So I have crafted a vulnerable code which looks like this:-

f = open('pass.txt', 'r')
password = f.read()
f.close()

def equal(s1, s2):
    if len(s1) != len(s2):
        return False

    for i in range(len(s1)):
        if s1[i] != s2[i]:
            return False
    return True

def login(upass):
    if equal(upass, password):
        print('Login successful')
    else:
        print('Login failed')

login()

This simple program will compare the user's given password (via the upass parameter) with the password stored in the file pass.txt in the same directory. If the password matches then it will greet the user with a welcome message, otherwise, it will alert the user that the login has failed.

Assumptions:-

Password is 4 characters long.
It is ONLY uppercase alphabets (no numeric values or special characters).

I was able to exploit the password by using the below method:-

def attack():

    leaked = ''

    for i in range(4):

        result = { letter : 0 for letter in ascii_uppercase }

        for _ in range(50000):
            for letter in ascii_uppercase:
                string = leaked + letter + '.' * ( 4 - len(leaked) - len(letter) )
                start = time_ns()
                login(string)
                end = time_ns()
                result[letter] += end - start

        leaked += sorted(result.items(), key = lambda item : item[1], reverse=True)[0][0]
        print(leaked)

I got the output as TEST which is correct. However, you can clearly see that I am not using == for string comparasion, in fact I am using its equivalent method. So I decided to switch back to == and check if my exploit is working. So I modified the equal() method to this:-

def equal(s1, s2):
    # if len(s1) != len(s2):
    #   return False

    # for i in range(len(s1)):
    #   if s1[i] != s2[i]:
    #       return False
    # return True

    if s1 == s2:
        return True
    else:
        return False

So using this code, when I called the attack method, to my surprise it gave me pretty weird results. I was given the following output when I ran it multiple times: AOAD, BVCB & LGAZ. This is clearly NOT the password stored in the pass.txt file.

So my question is, is == not vulnerable to timing attacks?

Solution

TL;DR yes it's vulnerable! However, you should still use == for comparison because that's the best thing there is.

Whether or not the implementation of str.__eq__() is vulnerable to timing attacks is easy to verify. Let's define four strings like so:

import random

# Lots of random characters from A to Z
s1 = ''.join(chr(random.randint(65, 90)) for _ in range(1000000))


s1c = s1                      # This string is equal and at the same memory location
s2 = ''.join(c for c in s1)   # This string is equal but not at the same memory loc
s3 = s1[:-1] + "?"            # This is not equal because of a mismatch at the end
s4 = "?" + s1[1:]             # This is not equal because of a mismatch at the start
s5 = s1[:-1000]               # This is not equal because of mismatched lengths

To time the equality check, we can use the timeit module.

import timeit

t1_1c = timeit.timeit('s1 == s1c', 'from __main__ import s1, s1c', number=10000)
t1_2  = timeit.timeit('s1 == s2', 'from __main__ import s1, s2', number=10000)
t1_3  = timeit.timeit('s1 == s3', 'from __main__ import s1, s3', number=10000)
t1_4  = timeit.timeit('s1 == s4', 'from __main__ import s1, s4', number=10000)
t1_5  = timeit.timeit('s1 == s5', 'from __main__ import s1, s5', number=10000)

I get the following numbers:

Variable	Value
`t1_1c`	0.0003349999997226405
`t1_2`	0.7978945999993812
`t1_3`	0.7638719000005949
`t1_4`	0.0011733000001186156
`t1_5`	0.0003372000001036213

Clearly, the strings at the same memory location report that they're equal almost instantly, but we don't expect that to be the case in a realistic situation. The strings that have an error at the start take orders of magnitude less time to report "not equal" than those that have an error at the end, so I don't think your findings are broadly applicable. This could be a version / OS issue, or maybe TEST is too short a string to really notice any of these problems.

Maybe changing the location of the mismatch will provide some insight? Such a long string seems overkill, so I'm going to reduce its size by an order of magnitide


s1 = ''.join(chr(random.randint(65, 90)) for _ in range(100000))

timings = []
for i in range(len(s1)):
    # Force a mismatch at index i
    s_temp = s1[0:i] + "?" + s1[i+1:]
    tm = timeit.timeit('s1 == s_temp', 'from __main__ import s1, s_temp', number=100)
    print(f"\r{i/len(s1)*100:.2f}".ljust(20, " "), end="")
    timings.append(tm)

Plotting this against the location of the mismatch gives the following (definitely not constant) plot:

The red point is when the strings are equal (no mismatch). It's evident that the further down the string the mismatch is, the longer the equality check takes. If we attribute the spread to the fact that my computer is working on other things too, and only look at the lower edge of this shape, it looks fairly linear (y-axis is logarithmic, linear axis here if you want), so it would lend some weight to the argument that the str.__eq__() method operates in linear time depending on how many characters it needs to check.

To wrap up,

No, the == or str.__eq__() method is not safe from timing attacks. Your password "TEST" was just too small to see the effect of comparison time.
Yes, you should use == for string comparison because that is the correct way to check string equality.
As @MisterMiyagi notes in a comment, the correct defense against timing attacks is to force your response to be delayed by more time than would be required to handle a long, wrong password instead of depending on other operations to provide that delay.