Search code examples
pythonsortinghashmapsetfrozenset

Sorting a list of python sets by value


The frozenset docs says:

The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.

However, the docs for for python sets says:

Since sets only define partial ordering (subset relationships), the output of the list.sort() method is undefined for lists of sets.

This makes me ask: why is the case? And, if I wanted to sort a list of sets by set content, how could I do this? I know that the extension intbitset: https://pypi.python.org/pypi/intbitset/2.3.0 , has a function for returning a bit sequence that represents the set contents. Is there something comparable for python sets?


Solution

  • Tuples, lists, strings, etc. have a natural lexicographic ordering and can be sorted because you can always compare two elements of a given collection. That is, either a < b, b < a, or a == b.

    A natural comparison between two sets is having a <= b mean a is a subset of b, which is what the expression a <= b actually does in Python. What the documentation means by "partial ordering" is that not all sets are comparable. Take, for example, the following sets:

    a = {1, 2, 3}
    b = {4, 5, 6}
    

    Is a a subset of b? No. Is b a subset of a? No. Are they equal? No. If you can't compare them at all, you clearly can't sort them.

    The only way you can sort a collection of sets is if your comparison function actually can compare any two elements (a total order). This means you can still sort a collection of sets using the above subset relation, but you will have to ensure that all of the sets are comparable (e.g. [{1}, {1, 2, 4}, {1, 2}]).

    The easiest way to do what you want is to transform each individual set into something that you actually can compare. Basically, you do f(a) <= f(b) (where <= is obvious) for some simple function f. This is done with the key keyword argument:

    In [10]: def f(some_set):
       ...       return max(some_set)
       ...
    
    In [11]: sorted([{1, 2, 3, 999}, {4, 5, 6}, {7, 8, 9}], key=f)
    Out[11]: [{4, 5, 6}, {7, 8, 9}, {1, 2, 3, 999}]
    

    You're sorting [f(set1), f(set2), f(set3)] and applying the resulting ordering to [set1, set2, set3].