I'm using Pandas in Jupyter notebook to compute the following from dataset structure below which contains duplicates by the column 'product'.
| name | val_id |val_amt | our_val_amt | val_against | product | **NEW FIELD**
| compx | xx1 | 530 | 424 | 418 | XL | 290
| compx | xx1 | 530 | 424 | 134 | CL | 134
| compx | xx2 | 590 | 472 | 600 | XL | 369
| compx | xx2 | 590 | 472 | 103 | CL | 103
I am trying to loop through the 'name' column (only 1 in my example but have more) and perform the following if/then conditions. Before finally structuring the data as per the output.
Conditions
1.if 'product' = 'CL' and is less than 'val_against' insert the 'saved_against' value in 'NEW FIELD'. e.g val_against (134) < our_val_amt (424) therefore 'NEW FIELD' = 134 (row2).
2.if product = 'XL' and is in the same 'val_id' (xx1). The remainder to be subtracted from to be inserted in 'NEW FIELD'. 'our_val_amt' = 424 - 134 (from step 1) = 290. This inserted above NEW FIELD.
3.Repeat steps for val_id xx2. NEW FIELD CL = 103 and XL = 472 - 103 = 369.
4.Last Step restructure the data to desired output below.
I've tried grouping the variables together and having the 'product' as columns but unfortunately i receive 6000+ columns and have no idea on how to compute the calculation for this scenario. i've also attempted to loop through without much success.
Desired output is:
val id
---------------
| xx1 | xx2 |
| our_val_amt |
---------------------------------------
| product |val_against | 424 | 472 |
----------------------------------------
| XL | 418 | 290 | |
| CL | 134 | 134 | |
| XL | 600 | | 369 |
| CL | 103 | | 103 |
Is this what you are looking for?
col = '**NEW FIELD**'
idx = ['product','val_against','val_id','our_val_amt']
r = df.set_index(idx).unstack([-2,-1])[col].reset_index().fillna('')
r
product val_against xx1 xx2
424 472
CL 103 103.0
CL 134 134.0
XL 418 290.0
XL 600 369.0
NOTE: Remove the last .fillna('')
to keep the values as numeric and leave missing values as nan
as that would be much better to work with in later steps.