Search code examples
pythonpython-itertools

Combinations of items in a list using specific critera


I am trying to find specific combinations of items in a list. The list is made up of x groups that repeat y times. In this example x and y = 3, but in practice could be much bigger in size. I want to find every combination of groups and y, but without duplicating an x value for a given combination. I think it’s easier to just show an example of what I want.

Here is an example.

A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']

So three groups, ST1, ST2, and ST3 – each having 3 iterations, 0.245, 0.290, and 0.335.

I want to find the following combinations.

('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')

Note that ST1, ST2, and ST3 are only in each combination once.

Here is code that I got to work at least for small cases.

import itertools
import numpy as np

comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
    # pdb.set_trace()
    for n in np.arange(len(gr_list)):
        if sum(itr[n].split('_')[0] in s for s in itr) > 1:
            break
    
    if n == len(gr_list)-1:
        comb.append(itr)

This works for a few small examples I tested, but when I tried larger values, I was getting more results than I thought, but that could be a mistake in my attempts to calculate how many are expected. But either way, it takes far too long. Is there a faster way to do this?

I do have both the values separately to begin with. Which as I write this I am feeling like thats a better way to approach it, but I am not sure how to do that either.


Solution

  • Create the groups as required and then use itertools.product on the groups:

    A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 
         'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 
         'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
    
    prefixes = set(s.split("_")[0] for s in A)
    groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]
    
    >>> list(itertools.product(*groups))
    
    [('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
     ('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
     ('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
     ('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
     ('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
     ('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
     ('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
     ('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
     ('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
     ('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
     ('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
     ('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
     ('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
     ('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
     ('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
     ('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
     ('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
     ('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
     ('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
     ('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
     ('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
     ('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
     ('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
     ('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
     ('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
     ('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
     ('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]