I am trying to figure out a way to determine all possible combinations of rows within a DataFrame that are below a budget, so let's say I have a dataframe like this:
data = [['Bread', 9, 'Food'], ['Shoes', 20, 'Clothes'], ['Shirt', 15, 'Clothes'], ['Milk', 5, 'Drink'], ['Cereal', 8, 'Food'], ['Chips', 10, 'Food'], ['Beer', 15, 'Drink'], ['Popcorn', 3, 'Food'], ['Ice Cream', 6, 'Food'], ['Soda', 4, 'Drink']]
df = pd.DataFrame(data, columns = ['Item', 'Price', 'Type'])
df
Data
Item Price Type
Bread 9 Food
Shoes 20 Clothes
Shirt 15 Clothes
Milk 5 Drink
Cereal 8 Food
Chips 10 Food
Beer 15 Drink
Popcorn 3 Food
Ice Cream 6 Food
Soda 4 Drink
I want to find every combination that I could purchase for under a specific budget, let's say $35 for this example, while only getting one of each type. I'd like to get a new dataframe made up of rows for each combination that works with each item in its own column.
I was trying to do it using itertools.product, but this can combine and add columns, but what I really need to do is combine and add a specific column based on values in another column. I'm a bit stumped now.
Thanks for your help!
Here a way using powerset
recipe from itertools
with pd.concat
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if (df.loc[l, 'Price'].sum() <= 35)])
Outputs a single dataframe with groups of product that meet $35 condition:
Item Price Type grp
0 Bread 9 Food 1
1 Shoes 20 Clothes 2
2 Shirt 15 Clothes 3
3 Milk 5 Drink 4
4 Cereal 8 Food 5
.. ... ... ... ...
3 Milk 5 Drink 752
4 Cereal 8 Food 752
7 Popcorn 3 Food 752
8 Ice Cream 6 Food 752
9 Soda 4 Drink 752
How many ways this came combined to meet $35 budget?
df_groups['grp'].nunique()
Output:
258
Details:
There are a couple of tricks/methods that are used here. First, we are using the index of the dataframe to create groups of rows or items using powerset
. Next, we are using enumerate
to identify each group and with assign
creating a new column in a dataframe with that group number from enumerate.
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if ((df.loc[l, 'Price'].sum() <= 35) &
(df.loc[l, 'Type'].value_counts()==1).all())])
How many groups?
df_groups['grp'].nunique()
62
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if ((df.loc[l, 'Price'].sum() <= 35) &
(df.loc[l, 'Type'].value_counts()==1).all()&
(len(df.loc[l, 'Type']) == 3))])
How many groups?
df_groups['grp'].nunique()
21