Search code examples
python-3.4z3z3py

ord() Function or ASCII Character Code of String with Z3 Solver


How can I convert a z3.String to a sequence of ASCII values?

For example, here is some code that I thought would check whether the ASCII values of all the characters in the string add up to 100:

import z3

def add_ascii_values(password):
    return sum(ord(character) for character in password)

password = z3.String("password")
solver = z3.Solver()
ascii_sum = add_ascii_values(password)
solver.add(ascii_sum == 100)

print(solver.check())
print(solver.model())

Unfortunately, I get this error:

TypeError: ord() expected string of length 1, but SeqRef found

It's apparent that ord doesn't work with z3.String. Is there something in Z3 that does?


Solution

  • 2022 Update

    Below answer, written back in 2018, no longer applies; as strings in SMTLib received a major update and thus the code given is outdated. Keeping it here for archival purposes, and in case you happen to have a really old z3 that you cannot upgrade for some reason. See the other answer for a variant that works with the new unicode strings in SMTLib: https://stackoverflow.com/a/70689580/936310

    Old Answer from 2018

    You're conflating Python strings and Z3 Strings; and unfortunately the two are quite different types.

    In Z3py, a String is simply a sequence of 8-bit values. And what you can do with a Z3 is actually quite limited; for instance you cannot iterate over the characters like you did in your add_ascii_values function. See this page for what the allowed functions are: https://rise4fun.com/z3/tutorialcontent/sequences (This page lists the functions in SMTLib parlance; but the equivalent ones are available from the z3py interface.)

    There are a few important restrictions/things that you need to keep in mind when working with Z3 sequences and strings:

    • You have to be very explicit about the lengths; In particular, you cannot sum over strings of arbitrary symbolic length. There are a few things you can do without specifying the length explicitly, but these are limited. (Like regex matches, substring extraction etc.)

    • You cannot extract a character out of a string. This is an oversight in my opinion, but SMTLib just has no way of doing so for the time being. Instead, you get a list of length 1. This causes a lot of headaches in programming, but there are workarounds. See below.

    • Anytime you loop over a string/sequence, you have to go up to a fixed bound. There are ways to program so you can cover "all strings upto length N" for some constant "N", but they do get hairy.

    Keeping all this in mind, I'd go about coding your example like the following; restricting password to be precisely 10 characters long:

    from z3 import *
    
    s = Solver()
    
    # Work around the fact that z3 has no way of giving us an element at an index. Sigh.
    ordHelperCounter = 0
    def OrdAt(inp, i):
        global ordHelperCounter
        v = BitVec("OrdAtHelper_%d_%d" % (i, ordHelperCounter), 8)
        ordHelperCounter += 1
        s.add(Unit(v) == SubString(inp, i, 1))
        return v
    
    # Your original function, but note the addition of len parameter and use of Sum
    def add_ascii_values(password, len):
        return Sum([OrdAt(password, i) for i in range(len)])
    
    # We'll have to force a constant length
    length = 10
    password = String("password")
    s.add(Length(password) == 10)
    ascii_sum = add_ascii_values(password, length)
    s.add(ascii_sum == 100)
    
    # Also require characters to be printable so we can view them:
    for i in range(length):
      v = OrdAt(password, i)
      s.add(v >= 0x20)
      s.add(v <= 0x7E)
    
    print(s.check())
    print(s.model()[password])
    

    The OrdAt function works around the problem of not being able to extract characters. Also note how we use Sum instead of sum, and how all "loops" are of fixed iteration count. I also added constraints to make all the ascii codes printable for convenience.

    When you run this, you get:

    sat
    ":X|@`y}@@@"
    

    Let's check it's indeed good:

    >>> len(":X|@`y}@@@")
    10
    >>> sum(ord(character) for character in ":X|@`y}@@@")
    868
    

    So, we did get a length 10 string; but how come the ord's don't sum up to 100? Now, you have to remember sequences are composed of 8-bit values, and thus the arithmetic is done modulo 256. So, the sum actually is:

    >>> sum(ord(character) for character in ":X|@`y}@@@") % 256
    100
    

    To avoid the overflows, you can either use larger bit-vectors, or more simply use Z3's unbounded Integer type Int. To do so, use the BV2Int function, by simply changing add_ascii_values to:

    def add_ascii_values(password, len):
        return Sum([BV2Int(OrdAt(password, i)) for i in range(len)])
    

    Now we'd get:

    unsat
    

    That's because each of our characters has at least value 0x20 and we wanted 10 characters; so there's no way to make them all sum up to 100. And z3 is precisely telling us that. If you increase your sum goal to something more reasonable, you'd start getting proper values.

    Programming with z3py is different than regular programming with Python, and z3 String objects are quite different than those of Python itself. Note that the sequence/string logic isn't even standardized yet by the SMTLib folks, so things can change. (In particular, I'm hoping they'll add functionality for extracting elements at an index!).

    Having said all this, going over the https://rise4fun.com/z3/tutorialcontent/sequences would be a good start to get familiar with them, and feel free to ask further questions.