Search code examples
pythoniteratorgeneratorcombinationspython-itertools

Efficient enumeration of ordered subsets in Python


I'm not sure of the appropriate mathematical terminology for the code I'm trying to write. I'd like to generate combinations of unique integers, where "ordered subsets" of each combination are used to exclude certain later combinations.

Hopefully an example will make this clear:

from itertools import chain, combinations
​
mylist = range(4)
max_depth = 3

rev = chain.from_iterable(combinations(mylist, i) for i in xrange(max_depth, 0, -1))
for el in list(rev):
    print el

That code results in output that contains all the subsets I want, but also some extra ones that I do not. I have manually inserted comments to indicate which elements I don't want.

(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 1)  # Exclude: (0, 1, _) occurs as part of (0, 1, 2) above
(0, 2)  # Exclude: (0, 2, _) occurs above
(0, 3)  # Keep
(1, 2)  # Exclude: (1, 2, _) occurs above
(1, 3)  # Keep: (_, 1, 3) occurs above, but (1, 3, _) does not
(2, 3)  # Keep
(0,)    # Exclude: (0, _, _) occurs above
(1,)    # Exclude: (1, _, _) occurs above
(2,)    # Exclude: (2, _) occurs above
(3,)    # Keep

Thus, the desired output of my generator or iterator would be:

(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 3)
(1, 3)
(2, 3)
(3,)  

I know I could make a list of all the (wanted and unwanted) combinations and then filter out the ones I don't want, but I was wondering if there was a more efficient, generator or iterator based way.


Solution

  • You are trying to exclude any combination that is a prefix of a previously-returned combination. Doing so is straightforward.

    • If a tuple t has length max_depth, it can't be a prefix of a previously-returned tuple, since any tuple it's a prefix of would have to be longer.
    • If a tuple t ends with mylist[-1], then it can't be a prefix of a previously-returned tuple, since there are no elements that could legally be added to the end of t to extend it.
    • If a tuple t has length less than max_depth and does not end with mylist[-1], then t is a prefix of the previously-returned tuple t + (mylist[-1],), and t should not be returned.

    Thus, the combinations you should generate are exactly the ones of length max_depth and the shorter ones that end with mylist[-1]. The following code does so, in exactly the same order as your original code, and correctly handling cases like maxdepth > len(mylist):

    def nonprefix_combinations(iterable, maxlen):
        iterable = list(iterable)
        if not (iterable and maxlen):
            return
        for comb in combinations(iterable, maxlen):
            yield comb
        for length in xrange(maxlen-2, -1, -1):
            for comb in combinations(iterable[:-1], length):
                yield comb + (iterable[-1],)
    

    (I've assumed here that in the case where maxdepth == 0, you still don't want to include the empty tuple in your output, even though for maxdepth == 0, it isn't a prefix of a previously-returned tuple. If you do want the empty tuple in this case, you can change if not (iterable and maxlen) to if not iterable.)