Search code examples
tuplesz3smtz3pysmt-lib

Modeling nested tuples / sequences in z3


I am currently constructing a symbolic execution engine for a small subset of Python. The most complicated data structure supported by this subset are arbitrarily nested tuples, i.e., you can write something like x = (1, 2, (3, 4), 5). In my SE engine, all values are represented as z3 objects. Yesterday, I had a quite hard time trying to model these nested tuples in z3.

What I tried:

  • Arrays. Problem: Arrays are, well, arrays, i.e., they span a rectangular space, which is not suitable for my tuples.
  • Sequences. I like the sequences data type; however, it does not support nesting. For instance, you cannot write z3.Concat(z3.Unit(z3.IntVal(1)), z3.Unit(z3.Concat(z3.Unit(z3.IntVal(2)), z3.Unit(z3.IntVal(3))))), which raises z3.z3types.Z3Exception: b"Sort of function 'seq.++' does not match the declared type. Given domain: (Seq Int) (Seq (Seq Int))"
  • Lists. However, Lists have a single type T, which I can only instantiate to List or, e.g., Int. As far as I know, there is not something like union types or a general super sort in z3.
  • Uninterpreted functions. I think it should be possible to introduce a function tuple with sort Tuple, and even overload it for different argument lengths and sorts. However, I don't know how to extract an element of a tuple(1, 2, 3), since that expression is not recursively defined.

I would appreciate any help from z3 / SMT experts around here!

Thanks a lot in advance!

Tuple datatypes: Problem with "abstract" indexes & tuples

I also tried the idea suggested by @alias, defining a datatype for tuples. This works quite well, but there is one problem: How can I model the access to a tuple element if either the tuple or the index are not concrete, i.e., are (expressions containing) variables?

I can define, for instance, a 2-tuple of ints as:

Tup_II = z3.Datatype('Tuple_II')
Tup_II.declare('tuple', ('fst', z3.IntSort()), ('snd', z3.IntSort()))
Tup_II = Tup_II.create()

a_tuple = Tup_II.tuple(z3.IntVal(1), z3.IntVal(2))
print(a_tuple)  # tuple(1, 2)

tuple_concrete_access = z3.simplify(Tup_II.fst(a_tuple))
print(tuple_concrete_access)  # 1

That's all fine. I can also define nested tuples on top by embedding the Tup_II datatype:

Tup_IT = z3.Datatype('Tuple_IT')
Tup_IT.declare('tuple', ('fst', z3.IntSort()), ('snd', Tup_II))
Tup_IT = Tup_IT.create()

another_tuple = Tup_IT.tuple(z3.IntVal(0), a_tuple)
print(another_tuple)  # tuple(0, tuple(1, 2))

But to get access to an element, I need to know whether the index is 0 or 1 to choose the right accessor (fst, snd).

I tried to get inspiration from how sequence types behave:

int_seq = z3.Concat(z3.Unit(z3.IntVal(1)), z3.Unit(z3.IntVal(2)))
print(int_seq)  # Concat(Unit(1), Unit(2))

concrete_access = z3.simplify(int_seq[z3.IntVal(0)])
print(concrete_access)  # 1

concrete_access_2 = z3.simplify(int_seq[z3.IntVal(2)])

x = z3.Int("x")
abstract_access = z3.simplify(int_seq[x])
print(abstract_access)
# If(And(x >= 0, Not(2 <= x)),
#    seq.nth_i(Concat(Unit(1), Unit(2)), x),
#    seq.nth_u(Concat(Unit(1), Unit(2)), x))

So one idea is to define a Tuple_II.nth function. However, if we have a tuple like Tup_IT consisting of elements of different types, how do I define the target domain of this function? For example,

target_sort = # ???
tup_IT_nth = z3.Function("Tuple_IT.nth", z3.IntSort(), Tup_II, target_sort)

So for that, I'd need some kind of super sort of int and Tup_II: The same problem as for lists.

Any ideas? :)

What I'd like to have: Utility function for creating tuple types

Just assume that I can solve the sort problem for the getter function; then I wrote a nice utility function creating all the things you need for dealing with tuples in presence of abstract indexes:

def create_tuple_type(*sorts: z3.SortRef) -> \
        Tuple[z3.Datatype, Dict[int, z3.FuncDeclRef], z3.FuncDeclRef, z3.BoolRef]:
    """
    DOES NOT YET WORK WITH NESTED TUPLES!

    Example:
    >>> tuple_II, accessors, getter, axioms = create_tuple_type(z3.IntSort(), z3.IntSort())
    >>>
    >>> a_tuple = tuple_II.tuple(z3.IntVal(1), z3.IntVal(2))
    >>>
    >>> print(z3.simplify(accessors[0](a_tuple)))  # 1
    >>> print(z3.simplify(getter(a_tuple, z3.IntVal(0))))  # Tuple_II.nth(tuple(1, 2), 0)
    >>>
    >>> s = z3.Solver()
    >>> s.set("timeout", 1000)
    >>> s.add(z3.Not(z3.Implies(axioms, z3.IntVal(1) == getter(a_tuple, z3.IntVal(0)))))
    >>> assert s.check() == z3.unsat  # proved!
    >>>
    >>> s = z3.Solver()
    >>> s.set("timeout", 1000)
    >>> s.add(z3.Not(z3.Implies(axioms, z3.IntVal(0) == getter(a_tuple, z3.IntVal(0)))))
    >>> assert s.check() == z3.unknown  # not proved!
    :param sorts: The argument sorts for the tuple type
    :return: The new tuple type along with
      accessor functions,
      a generic accessor function,
      and axioms for the generic accessor
    """
    dt_name = "Tuple_" + "".join([str(sort)[0] for sort in sorts])
    datatype = z3.Datatype(dt_name)
    datatype.declare('tuple', *{f"get_{i}": sort for i, sort in enumerate(sorts)}.items())
    datatype = datatype.create()

    accessors = {i: getattr(datatype, f"get_{i}") for i in range(len(sorts))}

    target_sort = z3.IntSort()  # ??? <-- What to do here?

    get = z3.Function(f"{dt_name}.nth", datatype, z3.IntSort(), target_sort)
    get_in_range = z3.Function(f"{dt_name}.nth_i", datatype, z3.IntSort(), target_sort)
    get_not_in_range = z3.Function(f"{dt_name}.nth_u", datatype, z3.IntSort(), target_sort)

    x = z3.Int("x")
    t = z3.Const("t", datatype)

    axiom_1 = z3.ForAll(
        [t, x],
        get(t, x) == z3.If(
            z3.And(x >= z3.IntVal(0), x < z3.IntVal(len(sorts))),
            get_in_range(t, x),
            get_not_in_range(t, x)
        )
    )

    axiom_2 = None
    for idx in range(len(sorts)):
        axiom = get_in_range(t, z3.IntVal(idx)) == accessors[idx](t)
        if axiom_2 is None:
            axiom_2 = axiom
            continue

        axiom_2 = z3.And(axiom_2, axiom)

    axiom_2 = z3.ForAll([t], axiom_2)

    return datatype, accessors, get, z3.And(axiom_1, axiom_2)

The problem is the declaration of target_sort with the # ??? <-- What to do here? comment.


Solution

  • Why not just use tuples to model tuples? You can declare a generic tuple-type, and then instantiate it to multiple times to handle nested ones.

    There's an example here: https://rise4fun.com/z3/tutorialcontent/guide#h27 and the same can be coded in z3py as well. See https://ericpony.github.io/z3py-tutorial/advanced-examples.htm as well for further examples. See section 4.2.3 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2021-05-12.pdf for the official description of support for datatypes in SMTLib.

    Note that for each tuple kind you'll have, you'll have to declare a new tuple-type. SMT is not polymorphic in that way. This process is usually called monomorphisation. If it ends up that the language you are modeling has variables that can "change" shape (i.e., a simple 2-tuple becomes a 3-tuple after an assignment), you'll have to consider that a fresh variable and model it as such. (But this is no different than a variable being assigned an int, and then a boolean. SMTLib is "simply-typed," i.e., all variables always have a single declared type.)

    Give this idea a try and if you run into issues post code-samples.

    Dealing with types

    If you've a tuple with type IntxBool (i.e., first component is Int and second is Bool), then projection functions will be fst : IntxBool -> Int and snd : IntxBool -> Bool. If the projection function you want to use is not concrete, then you have a problem. Is the result an Int, or a Bool?

    This, however, has very little to do with symbolic execution. Think about how you can "type-check" such a function. In Haskell like notation:

    index :: (Int, Bool) -> Int -> XX
    index (a, _) 0 = a
    index (_, b) 1 = b
    

    What to put in XX? There's just no way to type this definition in simply-typed lambda-calculus. And that's the situation you are in: SMTLib is essentially a simply-typed calculus, where all these types have to resolve at "compile" time.

    So, how do you deal with this? The easiest answer is that you do not allow indexing with a symbolic value. This is the stance languages like Haskell, ML, etc. take. But in a language with dynamic types such as Lisp/Scheme/Python, this is not the case. So the question becomes, how do you model such a dynamically typed-language in a simply-typed system such as that of SMTLib.

    The approach you are taking so far, which is the most natural one, is a shallow-embedding of your Python subset into SMTLib. You're using SMTLib functions/types to model your language. This is the typical approach, and there's nothing wrong with it. It's simpler, takes advantage of all the features of the underlying language's type-system and features. And should be preferred if possible.

    Where this approach falls short is when your object-language and meta-language features don't match: Your object-language is essentially dynamically typed (I'm assuming you're following Python's conventions), where as SMTLib is statically and simply-typed. This mismatch means you can no longer use the shallow-embedding approach directly.

    The alternative is to use a universal type that represents all the terms in your object language. This is also known as a deep-embedding, where your language terms essentially become syntactic elements. The type of your object-language expression becomes a data-type in your meta-language itself. This is more work, obviously, but you are pretty much required to do this if you want to code in SMTLib. But keep in mind that this is the same problem you'd have if you were writing an interpreter for this subset in a high-level typed-language like Haskell/ML etc: The shallow embedding breaks once the mismatch in the typing strategy starts showing up.

    My recommendation would be to simply not allow symbolic indexing into your tuples. Only support concrete indexes. As you work out the quirks of this system you will no doubt find further discrepancies. At that point you can switch to a deep-embedding.

    Here's a good paper to read (in the Haskell context) discussing how to model domain-specific languages using both shallow and embedded styles. The details will be different in your case as you also want to support symbolic constructs, but the basic ideas apply: http://www.cse.chalmers.se/~josefs/publications/TFP12.pdf