Have got two dataframe: df1
Store Category split_table tot_tables
11 FRUIT 7.0 12
11 VEG 5.0 12
13 FRUIT 5.0 11
13 VEG 6.0 11
df2
Site sum_list
11 [1,2,3,4,5,6.5,8,9.5,11,12]
13 [1,2,3,4,5,6.5,7.5,9,10,11]
have to generate new column df1['final_table']
by following below points:
final_table
' value is calculated by taking the nearest value from df2['sum_list']
from respective store.final_table
' value is calculated by df1['tot_tables'] - 'newly calculated 'final_table' for FRUIT category'
. So that, sum of final_table value for each store is equivalent to tot_table.Expected Output: df1
Store Category split_table tot_tables final_table
11 FRUIT 7.0 12 6.5
11 VEG 5.0 12 5.5
13 FRUIT 5.0 11 5.0
13 VEG 6.0 11 6.0
Any help will be appreciated!
I'm assuming you have only one FRUIT and VEG in each store:
# calculate fruit
m = df2.set_index("Site")["sum_list"].to_dict()
mask = df1["Category"].eq("FRUIT")
df1.loc[mask, "final_table"] = df1[mask].apply(
lambda x: min(m[x["Store"]], key=lambda y: abs(y - x["split_table"])),
axis=1,
)
# calculate vegetable
m = df1[mask].set_index("Store")["final_table"].to_dict()
mask = df1["Category"].eq("VEG")
df1.loc[mask, "final_table"] = df1[mask].apply(
lambda x: x["tot_tables"] - m[x["Store"]],
axis=1,
)
print(df1)
Prints:
Store Category split_table tot_tables final_table
0 11 FRUIT 7.0 12 6.5
1 11 VEG 5.0 12 5.5
2 13 FRUIT 5.0 11 5.0
3 13 VEG 6.0 11 6.0