I am trying to find specific combinations of items in a list. The list is made up of x groups that repeat y times. In this example x and y = 3, but in practice could be much bigger in size. I want to find every combination of groups and y, but without duplicating an x value for a given combination. I think it’s easier to just show an example of what I want.
Here is an example.
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
So three groups, ST1, ST2, and ST3 – each having 3 iterations, 0.245, 0.290, and 0.335.
I want to find the following combinations.
('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')
Note that ST1, ST2, and ST3 are only in each combination once.
Here is code that I got to work at least for small cases.
import itertools
import numpy as np
comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
# pdb.set_trace()
for n in np.arange(len(gr_list)):
if sum(itr[n].split('_')[0] in s for s in itr) > 1:
break
if n == len(gr_list)-1:
comb.append(itr)
This works for a few small examples I tested, but when I tried larger values, I was getting more results than I thought, but that could be a mistake in my attempts to calculate how many are expected. But either way, it takes far too long. Is there a faster way to do this?
I do have both the values separately to begin with. Which as I write this I am feeling like thats a better way to approach it, but I am not sure how to do that either.
Create the groups as required and then use itertools.product
on the groups:
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335',
'ST2_0.245', 'ST2_0.29', 'ST2_0.335',
'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
prefixes = set(s.split("_")[0] for s in A)
groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]
>>> list(itertools.product(*groups))
[('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]