Search code examples
pythonpandas

How to create combinations from dataframes for a specific combination size


Say I have a dataframe with 2 columns, how would I create all possible combinations for a specific combination size? Each row of the df should be treated as 1 item in the combination rather than 2 unique separate items. I want the columns of the combinations to be appended to the right. The solution should ideally be efficient since it takes long to generate all the combinations with a large list.

For example, I want to create all possible combinations with a combination size of 3.

import pandas as pd

df = pd.DataFrame({'A':['a','b','c','d'], 'B':['1','2','3','4']})

How would I get my dataframe to look like this?

    A  B  A  B  A  B
0   a  1  b  2  c  3
1   a  1  b  2  d  4
2   a  1  c  3  d  4
3   b  2  c  3  d  4

Solution

  • An approach is itertools to generate the combinations.

    • Define the combination size and generate all possible combinations of rows using itertools.combinations
    • Flatten each combination into a single list of values using itertools.chain.
    • combination_df is created from the flattened combinations and the columns are dynamically generated to repeat 'A' and 'B' for each combination

    Sample

    import itertools
    combination_size = 3
    combinations = list(itertools.combinations(df.values, combination_size))
    combination_df = pd.DataFrame(
        [list(itertools.chain(*comb)) for comb in combinations],
        columns=[col for i in range(combination_size) for col in df.columns]
    )
        )
    

    EDIT : Optimisation as suggested by @ouroboros1

    combination_df = pd.DataFrame( (chain.from_iterable(c) for c in combinations), columns=np.tile(df.columns, combination_size) )
    

    Output

       A  B  A  B  A  B
    0  a  1  b  2  c  3
    1  a  1  b  2  d  4
    2  a  1  c  3  d  4
    3  b  2  c  3  d  4