Search code examples
pythonpandasdataframejupyter-notebookdata-analysis

How to do python loop with if/then statement?


I'm using Pandas in Jupyter notebook to compute the following from dataset structure below which contains duplicates by the column 'product'.

| name  | val_id |val_amt | our_val_amt | val_against | product | **NEW FIELD**
| compx |  xx1   | 530    |   424       |    418      |   XL    |   290
| compx |  xx1   | 530    |   424       |    134      |   CL    |   134
| compx |  xx2   | 590    |   472       |    600      |   XL    |   369
| compx |  xx2   | 590    |   472       |    103      |   CL    |   103

I am trying to loop through the 'name' column (only 1 in my example but have more) and perform the following if/then conditions. Before finally structuring the data as per the output.

Conditions

1.if 'product' = 'CL' and is less than 'val_against' insert the 'saved_against' value in 'NEW FIELD'. e.g val_against (134) < our_val_amt (424) therefore 'NEW FIELD' = 134 (row2).

2.if product = 'XL' and is in the same 'val_id' (xx1). The remainder to be subtracted from to be inserted in 'NEW FIELD'. 'our_val_amt' = 424 - 134 (from step 1) = 290. This inserted above NEW FIELD.

3.Repeat steps for val_id xx2. NEW FIELD CL = 103 and XL = 472 - 103 = 369.

4.Last Step restructure the data to desired output below.

I've tried grouping the variables together and having the 'product' as columns but unfortunately i receive 6000+ columns and have no idea on how to compute the calculation for this scenario. i've also attempted to loop through without much success.

Desired output is:

                           val id
                        ---------------
                        | xx1   | xx2 |
                        | our_val_amt |
---------------------------------------
| product  |val_against |  424   | 472 |
----------------------------------------
| XL       |   418      |  290   |     |
| CL       |   134      |  134   |     |
| XL       |   600      |        | 369 |
| CL       |   103      |        | 103 |

Solution

  • Is this what you are looking for?

    col = '**NEW FIELD**'
    idx = ['product','val_against','val_id','our_val_amt']
    
    r = df.set_index(idx).unstack([-2,-1])[col].reset_index().fillna('')
    r
    
    product val_against    xx1    xx2
                           424    472
         CL         103         103.0
         CL         134  134.0       
         XL         418  290.0       
         XL         600         369.0
    

    NOTE: Remove the last .fillna('') to keep the values as numeric and leave missing values as nan as that would be much better to work with in later steps.