Search code examples
pythonpandascombinations

Generate combinations by randomly selecting a row from multiple groups (using pandas)


I have a pandas dataframe df which appears as following: (toy version below but the real df contains many more columns and groups)

group  sub  fruit
a      1    apple
a      2    banana
a      3    orange
b      1    pear
b      2    strawberry
b      3    cherry
c      1    kiwi
c      2    tomato
c      3    lemon

All groups have the same number of rows. I am trying to generate a new dataframe that contains all the combinations of group and sub by randomly selecting 1 row from each group.

Desired output:

combo  group  sub  fruit
1      a      1    apple
1      b      1    pear
1      c      1    kiwi
2      a      2    banana
2      b      2    strawberry
2      c      1    kiwi
3      a      3    orange
3      b      2    strawberry
3      c      1    kiwi
4      a      2    banana
4      b      2    strawberry
4      c      3    lemon
5      a      3    orange
5      b      3    cherry
5      c      3    lemon
...

In this particular example, I would expect 27 different combos. This example seems helpful but I haven't been able to iteratively generate each combination: Randomly select a row from each group using pandas


Solution

  • You can use itertools.product on the groups of indices:

    from itertools import product
    
    out = pd.concat({i: df.loc[list(idx)] for i, idx in
                     enumerate(product(*df.index.groupby(df['group']).values()), start=1)})
    

    output:

         group  sub   fruit
    1  0     a    1   apple
       3     b    1    pear
       6     c    1    kiwi
    2  0     a    1   apple
       3     b    1    pear
    ...    ...  ...     ...
    26 5     b    3  cherry
       7     c    2  tomato
    27 2     a    3  orange
       5     b    3  cherry
       8     c    3   lemon
    
    [81 rows x 3 columns]