I need to iterate through all possible combinations, putting elements from initial iterable into multiple buckets.
Illustration: Put ABCDEFG
into two buckets with lengths 3
and 2
-->
[ABC, DE], [ABC, DF], [ABC, DG], [ABD, CE], [ABD, EF], ..., [CDE, FG]
No repeats, only unique combinations in each bucket (without permutations).
Edit: The buckets are important because ABC, DE
and ABD, CE
considered as different combinations (still the same 5 elements overall but distributed differently in buckets). While ABC, DE
and ACB, DE
considered as the same combination.
The order change within bucket is not a different combination. While moving element between buckets is a different combination.
I suppose I should use itertools.combinations() some way:
itertools.combinations('ABCDEFG', 3) --> ABC, ABD, ..., EFG
or combine it with something.
But I'm struggling to figure out how to further distribute the output across multiple buckets of different lengths.
Here's on possible implementation that relies on the elements of the "sample space" being hashable. This assumes that it matters which bucket an element falls in, so particular [ABC, DE]
and [ABD, CE]
are considered two different outcomes.
import itertools
from typing import Iterable, Iterator, TypeVar
_T = TypeVar("_T")
def two_bin_combs(
iterable: Iterable[_T], b1: int, b2: int
) -> Iterator[tuple[tuple[_T, ...], tuple[_T, ...]]]:
space = set(iterable)
for left in itertools.combinations(space, b1):
for right in itertools.combinations(space - set(left), b2):
yield left, right
Then
sample_space = "ABCDEFG"
buckets = list(two_bin_combs(sample_space, 3, 2))
print(len(buckets))
print(buckets)
outputs
210
[
(('A', 'B', 'C'), ('E', 'D')),
(('A', 'B', 'D'), ('C', 'E')),
(('A', 'B', 'E'), ('C', 'D')),
(('A', 'C', 'D'), ('E', 'B')),
(('A', 'C', 'E'), ('D', 'B'))
# ...
(('C', 'F', 'G'), ('D', 'E')),
(('D', 'E', 'F'), ('G', 'C')),
(('D', 'E', 'G'), ('F', 'C')),
(('D', 'F', 'G'), ('C', 'E')),
(('E', 'F', 'G'), ('D', 'C')),
]
Note that the output order is not deterministic, since it depends on set iteration order, which can vary run-to-run.
This approach can also be generalized to an arbitrary number of bins:
def bin_combs(
iterable: Iterable[_T], *bin_sizes: int
) -> Iterator[tuple[tuple[_T, ...], ...]]:
total_space = frozenset(iterable)
if len(total_space) < sum(bin_sizes):
raise ValueError("total size of bins exceeds size of sample space")
def _bin_combs(
space: frozenset[_T], *bin_sizes: int
) -> Iterator[tuple[tuple[_T, ...], ...]]:
if not bin_sizes:
yield ()
return
curr_size, *remaining = bin_sizes
for head in itertools.combinations(space, curr_size):
for tail in _bin_combs(space - frozenset(head), *remaining):
yield (head,) + tail
return _bin_combs(total_space, *bin_sizes)