Does Hypothesis generate the same dictionaries/sets with different iteration orders? Is iteration order preserved when replaying from the database?

Anne Archibald wrote on the hypothesis-users mailing list:

Python's dictionaries are now guaranteed to preserve the order in which keys are added, but you can still have two dictionaries that test equal but have a different iteration order. Sets don't have a consistent iteration order (normally differs from run to run) but can likewise differ even with identical contents. It would sometimes be useful to verify that the results of some operation (serialisation, say) are independent of iteration order for these objects.

Does hypothesis generate the same dictionaries/sets with different iteration orders (explicitly or incidentally)? Conversely, when hypothesis is re-running a test (as with @example() or during shrinking or checking for flakiness), is it guaranteed that the input objects will have the same iteration order?

I know that it is possible to generate pairs with (potentially) different iteration orders by applying permutations to a sorted list-ified object (as below), so it is possible to test one's code's independence of iteration order. But does hypothesis attempt to explore this easily forgotten way common Python objects can differ?

Concretely, I am serialising sets and dicts to json, where they will be kept in version control, and I want to ensure that the representation doesn't change unless the objects do. For sets this is easy:
@st.composite
def set_shuffled_pairs(draw, sets):
    left = draw(sets)
    left_list = sorted(left)
    right = set(draw(st.permutations(left_list)))
    return left, right
Unfortunately I have complicated nested objects (examples generated with st.from_type()), some of whose contents are dictionaries and sets, and which are serialised through dictionaries, and it will be a challenge to go through and randomise the orders of all the dictionaries and sets.

What is the answer to that?

Solution

When hypothesis is re-running a test (as with @example or during shrinking or checking for flakiness), is it guaranteed that the input objects will have the same iteration order?

It's complicated. Hypothesis generates collections by repeatedly choosing elements, and that will always have the same order. As a result, lists, dicts, etc. will all have a consistent iteration order.

However, sets are a special case: their iteration order is arbitary within a particular process, and varies between processes due to hash seed randomization. Shrinking works in a single process, but replaying from the database in a future process might not! (see also this answer) Unfortunately there's nothing Hypothesis can do to help here; it's not even feasible to get the seed after the fact.

Does hypothesis generate the same dictionaries/sets with different iteration orders (explicitly or incidentally)?

Yes, incidentally: we might happen to generate two collections using the same elements chosen in a different order. This is obviously rare, and only likely when there are few possibilities, although we have a few heuristic tricks which make it more likely than the combinatorics suggest.

However, if you have reason to check such inputs I'd write a strategy to test them explicitly using st.permutations() and perhaps st.shared() or st.data().