Say I have a dataframe with 2 columns, how would I create all possible combinations for a specific combination size? Each row of the df should be treated as 1 item in the combination rather than 2 unique separate items. I want the columns of the combinations to be appended to the right. The solution should ideally be efficient since it takes long to generate all the combinations with a large list.
For example, I want to create all possible combinations with a combination size of 3.
import pandas as pd
df = pd.DataFrame({'A':['a','b','c','d'], 'B':['1','2','3','4']})
How would I get my dataframe to look like this?
A B A B A B
0 a 1 b 2 c 3
1 a 1 b 2 d 4
2 a 1 c 3 d 4
3 b 2 c 3 d 4
An approach is itertools
to generate the combinations.
itertools.combinations
itertools.chain
.combination_df
is created from the flattened combinations and the columns are dynamically generated to repeat 'A' and 'B' for each combinationSample
import itertools
combination_size = 3
combinations = list(itertools.combinations(df.values, combination_size))
combination_df = pd.DataFrame(
[list(itertools.chain(*comb)) for comb in combinations],
columns=[col for i in range(combination_size) for col in df.columns]
)
)
EDIT : Optimisation as suggested by @ouroboros1
combination_df = pd.DataFrame( (chain.from_iterable(c) for c in combinations), columns=np.tile(df.columns, combination_size) )
Output
A B A B A B
0 a 1 b 2 c 3
1 a 1 b 2 d 4
2 a 1 c 3 d 4
3 b 2 c 3 d 4