Search code examples
pythoncombinationscombinatoricspython-itertools

Generate all combinations of a list given a condition


I would like to generate all combinations of length n, for a list of k variables. I can do this as follows:

import itertools
import pandas as pd
from sklearn import datasets

dataset = datasets.load_breast_cancer()
X = dataset.data
y = dataset.target
df = pd.DataFrame(X, columns=dataset.feature_names)
features = dataset.feature_names

x = set(['mean radius', 'mean texture'])

for s in itertools.combinations(features, 3):
    if x.issubset(set(s)):
        print s

len(features) = 30, thus this will generate 4060 combinations where n=3. When n=10, this is 30,045,015 combinations.

len(tuple(itertools.combinations(features, 10)

Each of these combinations will then be evaluated based on the conditional statement. However for n>10 this becomes unfeasible.

Instead of generating all combinations, and then filtering by some condition like in this example, is it possible to generate all combinations given this condition?

In other words, generate all combinations where n=3, 4, 5 ... k, given 'mean radius' and 'mean texture' appear in the combination?


Solution

  • Just generate the combinations without 'mean radius' and 'mean texture' and add those two to every combination, thus largely reducing the number of combinations. This way you don't have to filter, every combination generated will be useful.

    # remove the fixed features from the pool:
    features = set(features) - x 
    for s in itertools.combinations(features, n - len(x)):
        s = set(s) & x # add the fixed features to each combination
        print(s)