Search code examples
pythonsortingpython-3.4

sorted() order unstable under certain conditions. Is this intentional?


sorted() documentation (emphasis mine):

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal [...]

In the code below, keys 8 and 16 have the same value, that is they would compare equal by the key=lambda x: d[x]:

d = {8: 0, 16: 0, 'a': 1}    

for i in range(200):    
    print(sorted(d, key=lambda x: d[x]))

# Prints always the same 
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# ......

Now lets try a small modification to the dictionary:

Example 1:

for i in range(10):

    if i == 5:

        del d[8]
        del d[16]

        d.update({16: 0})
        d.update({8: 0})

    print(sorted(d, key=lambda x: d[x]))

# Prints: 
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a'] 
# [16, 8, 'a']    <----- it changed the order
# [16, 8, 'a']
# [16, 8, 'a']
# [16, 8, 'a']
# [16, 8, 'a']

id(d) remains the same. Also the contents remain the same. What changes is the order of key insertion.

Note:
Those keys were chosen so that they cause hash collisions:

whether 8 occupies a location or 16 is determined by which one arrived at the party first


Example 2:

d = {8: 0, 16: 0, 'a': 1}
c = {16: 0, 8: 0, 'a': 1}

print(sorted(d, key=lambda x: d[x]))
print(sorted(c, key=lambda x: d[x]))

I know that d and c are different objects here:

id(d)  # 140067442777736
id(c)  # 140067442811016

But I would expect that sorted() treats objects that are d == c the exact same way.


Example 3: A different example would be the following:

d = {'a': 0, 'b': 0, 'c': 0, 'd': 1}

print(sorted(d, key=lambda x: d[x]))

Running this on multiple different runs (not with a for loop) would give a different order each time.

Note: That's assuming you use Python 3.3 or higher and your PYTHONHASHSEED=random


Question:
The order is not stable:

  • for the same object that gets its order modified (example 1).
  • for 2 objects that compare equal (example 2).
  • for different runs of the exact same code (example 3).

Is the order instability mentioned above a bug or am I missing something? Do I misunderstand the documentation by expecting all 3 examples to have a stable order?


Solution

  • It is not a bug, you missed something. You manipulated the dictionary changing the order of iteration, and it is that order that is kept stable by sorted(). Or put differently, sorted() keeps the input order stable, and you changed that input order by altering the dictionary.

    Note that sorted() doesn't 'see' a dictionary here. It is passed a sequence of keys, just as if you used list() on the dictionary first. In this case, both 8 and 16 hash to the same hash-table slot. Which one is listed first depends on the order of insertions:

    >>> d1 = {}
    >>> d1[8] = None
    >>> d1[16] = None
    >>> d1
    {8: None, 16: None}
    >>> list(d1)
    [8, 16]
    >>> d2 = {}
    >>> d2[16] = None
    >>> d2[8] = None
    >>> d2
    {16: None, 8: None}
    >>> list(d2)
    [16, 8]
    

    Calling list() on the two dictionaries shows that the keys are listed in a different order. sorted() just maintained that order as it iterates over the dictionary, since they both are otherwise sorted in the same location, by value. This is exactly what the documentation tells you would happen.

    Note that dictionary equality has no role to play here, at all. That's not what the compare equal part of the documentation is referring to. That only refers to the elements in the sequence; if the elements are equal (or in this case, if the key(element) values are equal), then those elements retain their relative order.

    So to focus on possible things you missed here:

    • The signature of the method is sorted(iterable[, key][, reverse]); the key word there is iterable. The first line of the documentation:

      Return a new sorted list from the items in iterable.

      Emphasis mine; it is the items from the iterable that are sorted. The Python glossary defines iterable as:

      An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...] When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values.

      Anything that takes an iterable basically will call iter() on it to loop over the sequence produced.

    • Dictionaries happen to be iterable, and iterating gives you the keys. See the mapping types documentation:

      iter(d)
      Return an iterator over the keys of the dictionary. This is a shortcut for iter(d.keys()).

      The dict.keys() documentation then points to the dictionary views section, which states:

      iter(dictview)
      Return an iterator over the keys, values or items (represented as tuples of (key, value)) in the dictionary.

      Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()). Another way to create the same list is pairs = [(v, k) for (k, v) in d.items()].

      Again, emphasis mine. So sorted() iterates, taking the items to sort. Dictionaries, when iterated, produce keys whose order is not stable.

    • Finally, the section you quoted, never contradicts this:

      The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal.

      So the elements in the sequence iterated do not change order. Nowhere does this state that dictionaries cannot produce a different order when iterated however.

      In other words, it doesn't matter if iterable_a == iterable_b here, it is not about iterable equality, only element equality matters to the sort order stability. If the iteration order differs, that order is kept stable.