Search code examples
pythonz3smt

Z3 takes an unexpected amount of time for an XOR cipher


I'm attempting to learn Z3 by break a simple (3 character) XOR-cipher: https://projecteuler.net/problem=59

So far I've specified some simple requirements, like the plaintext needing to be equivalent to the ciphertext ^ password, that the plaintext must consist of 7 bit ascii and that the password uses only lowercase characters.

ciphertext_bytes = parse_input("p059_cipher.txt")
set_param("parallel.enable", True)
set_param("parallel.threads.max", os.cpu_count())
s = Solver()
ctx = s.ctx

password = IntVector('x', 3, ctx)
ciphertext = IntVector('c', len(ciphertext_bytes), ctx)
plaintext = IntVector('p', len(ciphertext_bytes), ctx)

for x in password:
    s.add(x >= 97)
    s.add(x <= 122)

for i, value in enumerate(ciphertext_bytes):
    ciphertext[i] = Int(value, ctx)
    password_index = i % len(password)

    password_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, password[password_index].as_ast()), ctx)
    ciphertext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, ciphertext[i].as_ast()), ctx)
    plaintext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, plaintext[i].as_ast()), ctx)

    s.add(password_char ^ ciphertext_char == plaintext_char)

    s.add(plaintext[i] >= 0)
    s.add(plaintext[i] <= 127)
    s.add(ciphertext[i] >= 0)
    s.add(ciphertext[i] <= 255)

print(s.check())
print(s.model())

The s.check() call does not terminate in any reasonable amount of time however.

As the specified problem is (currently still) brute-forceable with 26*3 tries (guessing each of the password chars separately and checking the plaintext after) I must have made a mistake somewhere.

Why is this code slow?

Why does Z3 not use multi-threading here?


Solution

  • This isn't really a suitable problem for an SMT solver, alas. The issue here is not that you can't solve the puzzle using SMT; but rather it does not buy you anything new. Note that in an XOR based encryption, the equality:

      cipher = plain ^ key
    

    has a solution for every value of cipher and key: You simply get a plain-text as a sequence of bytes, and the SMT solver has no way of making sure that it's valid English. (Yes, you can constrain it to be 7-bits per character etc., but that's really not cutting down the search space in any meaningful way.) The trick here is to make sure the plain text is "meaningful" English, as is the password. But there's no inherent knowledge in the SMT solver to tell you if a sequence of bytes correspond to meaningful English text, or text in any known natural language.

    The best way to solve this problem, then, is to use classic decipher technology; like frequency analysis, dictionary based enumeration, etc. And none of that is really going to gain anything from an SMT solver. Alternatively, given this is an Euler-project problem, your best best is probably to scan through the dictionary file in your computer and exhaustively search for a suitable solution; since there're only so many 3-letter words that can be used as valid passwords.

    Regarding the "speed" issue you're observing: Note that int2bv is an expensive operation. You should avoid it, and simply use bit-vectors. But again, this will not help you here, because you'll "quickly" get a non-sense solution.