Search code examples
pythonpandasdataframepython-itertools

Pandas find all combinations of rows under a budget


I am trying to figure out a way to determine all possible combinations of rows within a DataFrame that are below a budget, so let's say I have a dataframe like this:

data = [['Bread', 9, 'Food'], ['Shoes', 20, 'Clothes'], ['Shirt', 15, 'Clothes'], ['Milk', 5, 'Drink'], ['Cereal', 8, 'Food'], ['Chips', 10, 'Food'], ['Beer', 15, 'Drink'], ['Popcorn', 3, 'Food'], ['Ice Cream', 6, 'Food'], ['Soda', 4, 'Drink']]
df = pd.DataFrame(data, columns = ['Item', 'Price', 'Type'])
df

Data

Item       Price  Type
Bread      9      Food
Shoes      20     Clothes
Shirt      15     Clothes
Milk       5      Drink
Cereal     8      Food
Chips      10     Food
Beer       15     Drink
Popcorn    3      Food
Ice Cream  6      Food
Soda       4      Drink

I want to find every combination that I could purchase for under a specific budget, let's say $35 for this example, while only getting one of each type. I'd like to get a new dataframe made up of rows for each combination that works with each item in its own column.

I was trying to do it using itertools.product, but this can combine and add columns, but what I really need to do is combine and add a specific column based on values in another column. I'm a bit stumped now.

Thanks for your help!


Solution

  • Here a way using powerset recipe from itertools with pd.concat

    from itertools import chain, combinations
    
    def powerset(iterable):
        "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
        s = list(iterable)
        return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
    
    df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                           enumerate(powerset(df.index)) 
                           if (df.loc[l, 'Price'].sum() <= 35)])
    

    Outputs a single dataframe with groups of product that meet $35 condition:

              Item  Price     Type  grp
    0       Bread      9     Food    1
    1       Shoes     20  Clothes    2
    2       Shirt     15  Clothes    3
    3        Milk      5    Drink    4
    4      Cereal      8     Food    5
    ..        ...    ...      ...  ...
    3        Milk      5    Drink  752
    4      Cereal      8     Food  752
    7     Popcorn      3     Food  752
    8   Ice Cream      6     Food  752
    9        Soda      4    Drink  752
    

    How many ways this came combined to meet $35 budget?

    df_groups['grp'].nunique()
    

    Output:

    258
    

    Details:

    There are a couple of tricks/methods that are used here. First, we are using the index of the dataframe to create groups of rows or items using powerset. Next, we are using enumerate to identify each group and with assign creating a new column in a dataframe with that group number from enumerate.

    Modify to capture no more than one of each type:

    df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                           enumerate(powerset(df.index)) 
                           if ((df.loc[l, 'Price'].sum() <= 35) & 
                               (df.loc[l, 'Type'].value_counts()==1).all())])
    

    How many groups?

    df_groups['grp'].nunique()
    62
    

    Get exactly one for each Type:

    df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                           enumerate(powerset(df.index)) 
                           if ((df.loc[l, 'Price'].sum() <= 35) & 
                               (df.loc[l, 'Type'].value_counts()==1).all()&
                               (len(df.loc[l, 'Type']) == 3))])
    

    How many groups?

    df_groups['grp'].nunique()
    21