Search code examples
arrayssequencez3smtz3py

How to calculate the average of a sequence in Z3(Python)


I am trying to specify a concrete formula in Z3-Python.

Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])

This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.

I posted at Theory of arrays in Z3: (1) model is difficult to understand, (2) do not know how to implement functions and (3) difference with sequences how I approached this using arrays. Among others, I need to define a function for getting the average of the array, and another one for its length; however, approaching this problem using Z3's arrays does not seem to be a good idea.

Thus, I 'emulated' it using sequences:

t = Int('t')
avg_seq = Int('avg_seq')
i = Int('i')

seq = Const('seq', SeqSort(IntSort()))

phi = Exists(i, And([(And(2<t, t<10)), (And(0 <= i, i< Length(seq))), ((t+avg_seq<seq[i]))]))

s = Solver()
s.add(phi)
print(s.check())
print(s.model())

Where one model I got is the following:

sat
[avg_seq = 7715,
 t = 3,
 seq = Unit(7719),
 seq.nth_u = [else -> 4]]

As you can see, I solved the problem of how to calculate Length, since it is a primitive in Z3.

However, how can I create an average function? Is there any primitives for it? I guess there is not, so I planned to create a recursive version of it. The pseudo-code of my idea is as follows:

Avg(Seq) = Sum(Seq)/Length(Seq)
Sum(Seq) = Head(Seq) + Sum(Tail(Seq))

Is this idea right? How can I implement this? I was recommended to read the example of RecAddDefinition at Converting `appendo` relation from smt2 to python, but I feel pretty lost, specially when I see constructions like Const.

I also have one extra doubt: I understood that when defininf a Exists(i, ...), the top-level declaration of i is unused; however, if I do not declare it, I receive the error: name 'i' is not defined.


Solution

  • This should get you started:

    from z3 import *
    
    IntSeqSort = SeqSort(IntSort())
    
    sumArray    = RecFunction('sumArray', IntSeqSort, IntSort())
    sumArrayArg = FreshConst(IntSeqSort)
    
    RecAddDefinition( sumArray
                    , [sumArrayArg]
                    , If(Length(sumArrayArg) == 0
                        , 0
                        , sumArrayArg[0] + sumArray(SubSeq(sumArrayArg, 1, Length(sumArrayArg) - 1))
                        )
                    )
    
    def avgArray(arr):
        return ToReal(sumArray(arr)) / ToReal(Length(arr))
    

    In the above, we have defined the recursive function to sum an array, and then avgArray which returns the average. Here's a use case. Let's ask it to find a sequence of length 4, whose average is 3:

    arr = Const('arr', IntSeqSort)
    
    s = Solver()
    s.add(Length(arr) == 4)
    s.add(avgArray(arr) == 3)
    
    if s.check() == sat:
        print(s.model()[arr])
    else:
        print("Not sat")
    

    When I run this, I get:

    Concat(Unit(-10451),
           Concat(Unit(31365),
                  Concat(Unit(-10451), Unit(-10451))))
    

    In more familiar notation, z3 found us:

    [-10451, 31365, -10451, -10451]
    

    You can easily see that it satisfies the constraints, it has length 4 and the average of the numbers is 3.

    You can take this and convert it to model your original formula in the usual way.

    Regarding the existential: This is a z3 hack: To use Exists and Forall you have to define the bound-variables at the top-level, but otherwise what you defined at top and what is used locally is unrelated. Think of it as a way of telling z3 that this variable exists just for internal bookkeeping purposes, but otherwise is irrelevant. If it helps, you can give it a really unique name (like localExistsVariable) so it doesn't confuse you in any other way. (And don't use it anywhere else!)