I want to iterate over subsequences of a list, but I have a notion of uniqueness defined by an external function, and I want to ignore combinations that have multiple elements with the same value under the function.
For example, I have a list of names, and I want to iterate over all combinations of three of those names such that all three names start with different letters. The following code accomplishes this:
import itertools
names = ["Anabel",
"Alison",
"Avery",
"Abigail",
"Aimee",
"Alice",
"Bethany",
"Beatrice",
"Claudia",
"Carolyn",
"Diane",
"Dana"]
f = lambda x : x[0]
for i in itertools.combinations(names, 3):
if ((f(i[0]) != f(i[1])) and
(f(i[0]) != f(i[2])) and
(f(i[1]) != f(i[2]))):
print i
What I'm actually doing here is iterating over all possible combinations of 3 names, and throwing out the ones that don't, which is of course slower than iterating over all combinations of 3 names. Is there a way that would actually be faster? To create an iterator that excludes the ones I want to skip in the first place?
Yes it's possible, one solution I could think of requires to create a dictionary grouping the f(value)
and the actual names as a dictionary. I'll use iteration_utilities.groupedby
here but it's easy to do it yourself using collections.defaultdict
as well (I'll show it at the bottom of the answer).
>>> from iteration_utilities import groupedby
>>> equivalent = groupedby(names, f)
>>> equivalent
{'A': ['Anabel', 'Alison', 'Avery', 'Abigail', 'Aimee', 'Alice'],
'B': ['Bethany', 'Beatrice'],
'C': ['Claudia', 'Carolyn'],
'D': ['Diane', 'Dana']}
Then you iterate over the combinations of the (sorted) keys in that dictionary and then use itertools.product
to do the iterations over all names for each prefix:
import itertools
for comb in itertools.combinations(sorted(equivalent), 3):
for uniquecomb in itertools.product(*[equivalent[i] for i in comb]):
print(uniquecomb)
The sorted
is used because otherwise the order of appearance wouldn't be deterministic between runs.
To show that this is much faster I used the following setup:
def unique_combs(names, f):
equivalent = groupedby(names, f)
for comb in itertools.combinations(sorted(equivalent), 3):
for uniquecomb in itertools.product(*[equivalent[i] for i in comb]):
pass
def unique_combs_original(names, f):
for i in itertools.combinations(names, 3):
if ((f(i[0]) != f(i[1])) and
(f(i[0]) != f(i[2])) and
(f(i[1]) != f(i[2]))):
pass
names = ["Anabel", "Alison", "Avery", "Abigail", "Aimee", "Alice",
"Bethany", "Beatrice",
"Claudia", "Carolyn",
"Diane", "Dana"]
f = lambda x : x[0]
%timeit unique_combs(names, f) # 10000 loops, best of 3: 59.4 µs per loop
%timeit unique_combs_original(names, f) # 1000 loops, best of 3: 417 µs per loop
But it also scales much better if there are lots of to-be-discarded combinations:
names = names * 10 # more duplicates
%timeit unique_combs(names, f) # 100 loops, best of 3: 9.74 ms per loop
%timeit unique_combs_original(names, f) # 1 loop, best of 3: 577 ms per loop
I mentioned defaultdict
instead of groupedby
, for completness it can be created like this:
from collections import defaultdict
>>> names = ["Anabel", "Alison", "Avery", "Abigail", "Aimee", "Alice",
... "Bethany", "Beatrice",
... "Claudia", "Carolyn",
... "Diane", "Dana"]
>>> equivalent = defaultdict(list)
>>> for name in names:
... equivalent[f(name)].append(name)
>>> equivalent
defaultdict(list,
{'A': ['Anabel', 'Alison', 'Avery', 'Abigail', 'Aimee', 'Alice'],
'B': ['Bethany', 'Beatrice'],
'C': ['Claudia', 'Carolyn'],
'D': ['Diane', 'Dana']})