Search code examples
pythonpython-itertools

Increasing number of permutations of all possible combinations of a list with repetitions allowed


I am trying without success to understand how to use itertools to generate a list with all possible combinations of the elements of a list, with an increasing size of elements to pick and including repetitions. I would like to add also a separator:

lis = ['a','b','c']
separator = '/'
total_number_of_combinations = 3
permutation_list = ['a','b','c', 'a/a', 'a/b', 'a/c', 'b/a', 'b/b', 'b/c', 'c/a', 'c/b', 'c/c',
                    'a/a/a', 'a/a/b', 'a/a/c', 'a/b/a', 'a/b/b', 'a/b/c', 'a/c/a', 'a/c/b', 'a/c/c'
                    'b/a/a', 'b/a/b', 'b/a/c', 'b/b/a', 'b/b/b', 'b/b/c', 'b/c/a', 'b/c/b', 'b/c/c'
                    'c/a/a', 'c/a/b', 'c/a/c', 'c/b/a', 'c/b/b', 'c/b/c', 'c/c/a', 'c/c/b', 'c/c/c']

The list will have then len(lis)+len(lis)**2+len(lis)**3+...++len(lis)**n elements, with n=total_number_of_combinations. I need to keep the separator and the total_numbers_of_combinations changeables.

I need this in a list that can be check as a condition for filtering a pandas DataFrame (i will check dt[dt.my_col.isin(permutation_list)])

I appreciate any help or pointing to a duplicated topic or even an explanation of how to correctly state this problem, because I did not found any topic that answer this question (maybe I am using the wrong keywords...). Maybe also there is a function from another module that does that, but I don't know.

UPDATE: Following the request of @Scott, here is my real case:

lis = ['BRUTELE','COCKPIT EST', 'CIRCET']
separator = ' / '
total_number_of_combinations = 10

so my final list need to have 88572 elements.


Solution

  • What you are trying to get is a product, not a combination.

    lis = ["BRUTELE", "COCKPIT EST", "CIRCET"]
    separator = " / "
    total_number_of_combinations = 3
    
    result = list(
        itertools.chain.from_iterable(
            (separator.join(a) for a in itertools.product(lis, repeat=i))
            for i in range(1, total_number_of_combinations + 1)
        )
    )
    
    assert result == ['BRUTELE', 'COCKPIT EST', 'CIRCET', 'BRUTELE / BRUTELE', 'BRUTELE / COCKPIT EST', 'BRUTELE / CIRCET', 'COCKPIT EST / BRUTELE', 'COCKPIT EST / COCKPIT EST', 'COCKPIT EST / CIRCET', 'CIRCET / BRUTELE', 'CIRCET / COCKPIT EST', 'CIRCET / CIRCET', 'BRUTELE / BRUTELE / BRUTELE', 'BRUTELE / BRUTELE / COCKPIT EST', 'BRUTELE / BRUTELE / CIRCET', 'BRUTELE / COCKPIT EST / BRUTELE', 'BRUTELE / COCKPIT EST / COCKPIT EST', 'BRUTELE / COCKPIT EST / CIRCET', 'BRUTELE / CIRCET / BRUTELE', 'BRUTELE / CIRCET / COCKPIT EST', 'BRUTELE / CIRCET / CIRCET', 'COCKPIT EST / BRUTELE / BRUTELE', 'COCKPIT EST / BRUTELE / COCKPIT EST', 'COCKPIT EST / BRUTELE / CIRCET', 'COCKPIT EST / COCKPIT EST / BRUTELE', 'COCKPIT EST / COCKPIT EST / COCKPIT EST', 'COCKPIT EST / COCKPIT EST / CIRCET', 'COCKPIT EST / CIRCET / BRUTELE', 'COCKPIT EST / CIRCET / COCKPIT EST', 'COCKPIT EST / CIRCET / CIRCET', 'CIRCET / BRUTELE / BRUTELE', 'CIRCET / BRUTELE / COCKPIT EST', 'CIRCET / BRUTELE / CIRCET', 'CIRCET / COCKPIT EST / BRUTELE', 'CIRCET / COCKPIT EST / COCKPIT EST', 'CIRCET / COCKPIT EST / CIRCET', 'CIRCET / CIRCET / BRUTELE', 'CIRCET / CIRCET / COCKPIT EST', 'CIRCET / CIRCET / CIRCET']                                      
    

    However, the number of resulting items is O(len(lis) ** total_number_of_combinations), which may be computationally expensive. A better method would be parsing a string by splitting at separators and testing the membership of each split string.