I have a dataframe of seller
, item
, price
, shipping
, free shipping minimum
, count available
, and count needed
. My goal is to find the cheapest combination of seller
and item
based on their total
which is calculated later (code to calculate this is shown below). Sample data is included below:
import pandas as pd
item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1']
price1 = [1.85, 1.94, 2.00, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]
df1 = pd.DataFrame({'Seller':seller1,
'Item':item1,
'Price':price1,
'Shipping':shipping1,
'Free Shipping Minimum':freeship1,
'Count Available':countavailable1,
'Count Needed':countneeded1})
# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Fulfills Count Needed'] = 'Yes'
else:
df1.at[index, 'Fulfills Count Needed'] = 'No'
# dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price']
else:
df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price']
However, any one seller
can sell multiple item
s. I want to minimize how much shipping is paid, so I want to group item
s together by seller
. So I group them based on a way I saw in another thread by using the .first()
method in order to keep every column in the new grouped dataframe.
# Don't calc [Total] until sellers have been grouped
# use first() method to return all columns and perform no other aggregations
grouped1 = df1.sort_values('Price').groupby(['Seller', 'Item']).first()
It's at this point I want to calculate the total
by seller
. So I have the following code, but it calculates total
for each item
as opposed to seller
which means shipping
is being added multiple times per seller based on how many items are in each group or free shipping is not being applied when Price x Count
is over free shipping minimum
.
# calc [Total]
for index, row in grouped1.iterrows():
if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50):
grouped1.at[index, 'Total'] = row['Price x Count'] + 0
elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5):
grouped1.at[index, 'Total'] = row['Price x Count'] + 0
else:
grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']
It actually looks like I may need to sum Price x Count
for each seller
while calculating Total
, but that's essentially the same problem since I don't know how to calculate a column per row of the outer index. What methods can I use to do this?
Also, if anyone has any tips on how to tackle the second half of my goal, fire away. I want to return only as many of each item as I need. For example, I need 2 of "item 1" and 2 of "item 2". If "Seller 1" has 2 of "item 1" and 1 of "item 2" while "Seller 2" has 1 of "item 1" and 1 of "item 2", I want everything from "Seller 1" (assuming it is cheapest) but only the 1 "item 1" from "Seller 2". This seems like it will play into calculating the total
column, but I'm not sure how to implement it.
I ended up deciding to first groupby Seller
and sum Price x Count
to find the Subtotal
s, convert this into a dataframe, then merging df1
with the new subtotal
dataframe to create the grouped
dataframe. Then I created the Totals
column using the np.where
suggestion (which is much more elegant than my for loop and handles NaN values easily). Finally, grouping by Seller
, Total
, and Item
returns the result I wanted. The finalized code is below:
import pandas as pd
import numpy as np
item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1']
price1 = [1.85, 1.94, 2.69, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]
df1 = pd.DataFrame({'Seller':seller1,
'Item':item1,
'Price':price1,
'Shipping':shipping1,
'Free Shipping Minimum':freeship1,
'Count Available':countavailable1,
'Count Needed':countneeded1})
# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Fulfills Count Needed'] = 'Yes'
else:
df1.at[index, 'Fulfills Count Needed'] = 'No'
# dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price']
else:
df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price']
# subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe
subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index()
subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True)
grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller')
# calc [Total]
grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'],
grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping'])
grouped.groupby(['Seller', 'Total', 'Item']).first()