z3py: How to write a constraint for an array to be sorted?

I'm trying to create a constraint to create a "sorted" copy of an arrayOld(int, int). With "sorted" I mean, that:

for any x >= y which have a value stored in arrayOld, it should hold that Select(array, x) >= Select(array, y),
only those indices that were populated in arrayOld should store a value,
and the number of occurances of each value should be the same as in the old array.

I'm practically stuck on all three parts. For the first one I can write the constraint without the restriction of "having stored a value in arrayOld":

# 1. weak: for any x >= y it should hold that Select(array, x) >= Select(array, y)
arrayOld = Array("arrayOld", IntSort(), IntSort())
# some constraints on arrayOld
x, y = FreshInt(), FreshInt()
array = Array("array", IntSort(), IntSort())
constraint1_weak = ForAll(
    [x, y],  
    Implies(
        x >= y, 
        Select(array, x) >= Select(array, y)
    )
)

I was thinking of something like this for getting information about values stored in the old array, but I don't know how to write this _there_has_been_a_Store correctly:

arrayOld = Array("arrayOld", IntSort(), IntSort())
# some constraints on arrayOld
x, y = FreshInt(), FreshInt()
array = Array("array", IntSort(), IntSort())
constraint1 = (
    ForAll([x, y], Implies(And(x >= y, And(
        _there_has_been_a_Store(arrayOld, x, value),
        _there_has_been_a_Store(arrayOld, y, value)
    )), Select(array, x) >= Select(array, y)))
)

As I'm not sure how to add a constraint on whether there is a stored value in the array, I can't really do the others as well. I guess the third one might be even harder than that, right?

I have been able to draft a constraint ensuring, that at least the same elements occur in both arrays, excluding the number of occurances:

arrayOld = Array("arrayOld", IntSort(), IntSort())
# some constraints on arrayOld
x, y = FreshInt(), FreshInt()
array = Array("array", IntSort(), IntSort())
constraint3_weak = ForAll([x], 
    Exists([y], Select(array, y) == x) == Exists([y], Select(arrayOld, y) == x)
)

... but of course this is not quite what I want

Solution

SMTLib arrays are extensional. That is, you can't tell two arrays apart aside from checking if they store the same value in the same location, for all locations. In other words, whether an array "came to being" with random values, or have some indices stored is something you cannot tell. There's no formula that you can write that can distinguish two such arrays.

This is your fundamental problem when you say: "I don't know how to write this _there_has_been_a_Store correctly." The reason is simple: You simply can't.

So, to express if an array is a sorted version of another can only be done with respect to its entire domain, or some predefined subdomain of it. For instance, sorted within indices a and b for some constant values a and b. (And I think you already have the basics laid down to do that.) "Sorted only with respect to those elements that are stored" isn't something that's expressible in the extensional array logic supported by SMTLib.

Note that this is also true for sequences, which have finite length, as supported by z3 (https://microsoft.github.io/z3guide/docs/theories/Sequences/), for instance.

However, if you give up on native arrays, you can "devise" your own data-type for which you can do this sort of thing. It won't be pretty, nor it'll be easy to reason with; but it's possible. The idea is to use what's known as a "free" datatype, consisting of two constructors: Empty (corresponding to an array that has no items stored) and Stored, which will be recursive, and representing the storing of an item on an existing array. In SMTLib terms, something like:

  (declare-datatype StoredArray
      ((Empty)
       (Stored (storedIndex Int) (storedValue Int) (rest StoredArray))
      )
  )

Then you'd define a recursive function to check if an array is a sorted version of another. (Again, not terribly easy, but doable.)

Note that while all of this is "doable," it's more or less futile. Since all of this will involve recursive functions, any interesting proof about them will have to use induction, and SMT solvers don't do induction out-of-the-box. (Search stack-overflow for many questions on this very point.) In a sense, the internal notion of Arrays remain "decidable" precisely because they only rely on extensionality, and not allow you to see how they were "constructed." In your terms, there's no way to say if there-has-been-a-store.

A good paper to read on these topics is What's decidable about arrays. As with anything else, the design of an API for ease-of-use vs. what's provable about them is in constant contrast: Internal arrays are easy to use, but they can't express "length" or if "there has been a store" to them; but on the plus side we get decidability about interesting problems about them. You can design a data-structure that lets you decide whether there were "store"s, or keep track of an explicit length of an array, but then what you can decide becomes a question.

Also, don't feel bad! The kind of questions you're asking are precisely the kind of questions the theorem-proving community involves itself with. Proof automation vs. expressivity. I'm not sure what your goals are, but thinking about these questions is a good way to get started.