I have two dataframes like as below
proj_df = pd.DataFrame({'reg_id':[1,2,3,4,5,6,7],
'partner': ['ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123'],
'part_no':['P123','P123','P123','P123','P123','P123','P123'],
'cust_info':['Apple','Apple','Apple','Apple','Apple','Apple','Tesla'],
'qty_1st_year':[100,100,600,150,50,0,10]})
order_df = pd.DataFrame({'partner': ['ABC_123','ABC_123','JKL_123','MNO_123'],
'part_no':['P123','P123','Q123','P567'],
'cust_info':['Apple','Hyundai','REON','Renault'],
'order_qty':[1000,600,50,0]})
I would like to do the below
a) Merge two dataframes based on partner,part_no,cust_info
b) split the order_qty
column from order_df
and assign the appropriate portion to a new column called assigned_qty
c) appropriate portion is determined by the percentage distribution of qty_1st_year
. Meaning, you divide individual qty_1st_year
value by the total sum of Qty_1st_year
for each group of partner,part_no and cust_info
.
So, I tried the below
sum_df = proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year'].sum().reset_index()
sum_df.columns = ['partner','part_no','cust_info','total_qty_all_project']
t1=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
t2 = t1.merge(sum_df,on=['partner','part_no','cust_info'],how='left')
t2['pct_value'] = (t2['qty_1st_year']/t2['total_qty_all_project'])*100
proj_df['assigned_value'] = (t2['order_qty']*t2['pct_value'])/100
While this seems to work fine, I would like to know is there any other better and elegant way to do this task.
I expect my output to be like as below
Use GroupBy.transform
with sum
for new column for avoid double merge, last if need remove columns after multiplication and division add DataFrame.pop
, last if necessary divide by 10
for expected ouput:
proj_df['total_qty'] = (proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year']
.transform('sum'))
df=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
df['assigned_value'] = (df.pop('order_qty')*
(df['qty_1st_year']/
df.pop('total_qty'))).div(10)
print (df)
reg_id partner part_no cust_info qty_1st_year assigned_value
0 1 ABC_123 P123 Apple 100 10.0
1 2 ABC_123 P123 Apple 100 10.0
2 3 ABC_123 P123 Apple 600 60.0
3 4 ABC_123 P123 Apple 150 15.0
4 5 ABC_123 P123 Apple 50 5.0
5 6 ABC_123 P123 Apple 0 0.0
6 7 ABC_123 P123 Tesla 10 NaN