Search code examples
pythoncombinationspython-itertools

How to put elements into few buckets (probably with itertools.combinations())?


I need to iterate through all possible combinations, putting elements from initial iterable into multiple buckets.
Illustration: Put ABCDEFG into two buckets with lengths 3 and 2 -->

[ABC, DE], [ABC, DF], [ABC, DG], [ABD, CE], [ABD, EF], ..., [CDE, FG]

No repeats, only unique combinations in each bucket (without permutations).

Edit: The buckets are important because ABC, DE and ABD, CE considered as different combinations (still the same 5 elements overall but distributed differently in buckets). While ABC, DE and ACB, DE considered as the same combination.
The order change within bucket is not a different combination. While moving element between buckets is a different combination.

I suppose I should use itertools.combinations() some way:

itertools.combinations('ABCDEFG', 3) --> ABC, ABD, ..., EFG

or combine it with something.

But I'm struggling to figure out how to further distribute the output across multiple buckets of different lengths.


Solution

  • Here's on possible implementation that relies on the elements of the "sample space" being hashable. This assumes that it matters which bucket an element falls in, so particular [ABC, DE] and [ABD, CE] are considered two different outcomes.

    import itertools
    from typing import Iterable, Iterator, TypeVar
    
    
    _T = TypeVar("_T")
    
    
    def two_bin_combs(
        iterable: Iterable[_T], b1: int, b2: int
    ) -> Iterator[tuple[tuple[_T, ...], tuple[_T, ...]]]:
        space = set(iterable)
        for left in itertools.combinations(space, b1):
            for right in itertools.combinations(space - set(left), b2):
                yield left, right
    

    Then

    sample_space = "ABCDEFG"
    buckets = list(two_bin_combs(sample_space, 3, 2))
    
    print(len(buckets))
    print(buckets)
    

    outputs

    210
    [
        (('A', 'B', 'C'), ('E', 'D')),
        (('A', 'B', 'D'), ('C', 'E')),
        (('A', 'B', 'E'), ('C', 'D')), 
        (('A', 'C', 'D'), ('E', 'B')), 
        (('A', 'C', 'E'), ('D', 'B'))
        # ...
        (('C', 'F', 'G'), ('D', 'E')), 
        (('D', 'E', 'F'), ('G', 'C')), 
        (('D', 'E', 'G'), ('F', 'C')), 
        (('D', 'F', 'G'), ('C', 'E')), 
        (('E', 'F', 'G'), ('D', 'C')),
    ]
    

    Note that the output order is not deterministic, since it depends on set iteration order, which can vary run-to-run.


    This approach can also be generalized to an arbitrary number of bins:

    def bin_combs(
        iterable: Iterable[_T], *bin_sizes: int
    ) -> Iterator[tuple[tuple[_T, ...], ...]]:
    
        total_space = frozenset(iterable)
        if len(total_space) < sum(bin_sizes):
            raise ValueError("total size of bins exceeds size of sample space")
    
        def _bin_combs(
            space: frozenset[_T], *bin_sizes: int
        ) -> Iterator[tuple[tuple[_T, ...], ...]]:
    
            if not bin_sizes:
                yield ()
                return
    
            curr_size, *remaining = bin_sizes
    
            for head in itertools.combinations(space, curr_size):
                for tail in _bin_combs(space - frozenset(head), *remaining):
                    yield (head,) + tail
    
        return _bin_combs(total_space, *bin_sizes)