Pandas Change values of a dataframe based on an override

I have a pandas dataframe which looks something like this.

orig |  dest |  type | class |  BKT   | BKT_order | value   | fc_Cap | sc_Cap
-----+-------+-------+-------+--------+-----------+---------+--------+---------
 AMD |  TRY  |   SA  | fc    |  MA    |   1       |   12.04 |   20   |   50
 AMD |  TRY  |   SA  | fc    |  TY    |   2       |   11.5  |   20   |   50
 AMD |  TRY  |   SA  | fc    |  NY    |   3       |   17.7  |   20   |   50
 AMD |  TRY  |   SA  | fc    |  MU    |   4       |   09.7  |   20   |   50
 AMD |  TRY  |   PE  | fc    |  RE    |   1       |   09.7  |   20   |   50
 AMD |  TRY  |   PE  | sc    |  EW    |   5       |   07.7  |   20   |   50
 NCL |  MNK  |   PE  | sc    |  PO    |   2       |   08.7  |   20   |   50
 NCL |  MNK  |   PE  | sc    |  TU    |   3       |   12.5  |   20   |   50
 NCL |  MNK  |   PE  | sc    |  MA    |   1       |   16.7  |   20   |   50

Also i have an override Dataframe which may look something like this:

orig |  dest |  type |  max_BKT 
-----+-------+-------+-----------
 AMD |  TRY  |   SA  |  TY
 NCL |  MNK  |   PE  |  PO
 NCL |  AGZ  |   PE  |  PO

what i want to do is modify the original dataframe such that after comparison of orig dest type & BKT ( with max_BKT) values, the value column for any rows which have the BKT_order higher than or equal to the max_BKT in override DF is set to either fc_Cap or sc_Cap depending on the class value.

For Example in above scenario,

Since the Override DF sets max_BKT as TY for AMD | TRY | SA and the bucket order for TY is 2 in original Df, i need to set the value column equal to fc_Cap or sc_Cap depending on the value of class for all rows where BKT_order >= 2

So basically:

filter the rows for orig dest type combination
Get the BKT_order of max_BKT from the Original DF
for each row that matches the above criteria
- if class == fc update value column with fc_Cap
- if class == sc update value column with sc_Cap

So our original DF looks something like this:

orig |  dest |  type | class |  BKT   | BKT_order | value   | fc_Cap | sc_Cap
-----+-------+-------+-------+--------+-----------+---------+--------+---------
 AMD |  TRY  |   SA  | fc    |  MA    |   1       |   12.04 |   20   |   50
 AMD |  TRY  |   SA  | fc    |  TY    |   2       |   20    |   20   |   50
 AMD |  TRY  |   SA  | fc    |  NY    |   3       |   20    |   20   |   50
 AMD |  TRY  |   SA  | fc    |  MU    |   4       |   20    |   20   |   50
 AMD |  TRY  |   PE  | fc    |  RE    |   1       |   09.7  |   20   |   50
 AMD |  TRY  |   PE  | sc    |  EW    |   5       |   07.7  |   20   |   50
 NCL |  MNK  |   PE  | sc    |  PO    |   2       |   50    |   20   |   50
 NCL |  MNK  |   PE  | sc    |  TU    |   3       |   50    |   20   |   50
 NCL |  MNK  |   PE  | sc    |  MA    |   1       |   16.7  |   20   |   50

I have tried an approach to iterate over the override df and try to handle 1 row at a time but, i get stuck when i need to do a reverse lookup to get the BKT_order of the max_BKT from original df.

Hope that makes sense... i am fairly new to pandas.

Solution

That's a fairly complex task. The individual steps are straightforward though.

You need:

indexing lookup to find the cap values based on class
merge to match the max_BKT
mask+groupby.transform to identify the rows to mask

idx, cols = pd.factorize(df['class']+'_Cap')

group = ['orig', 'dest', 'type']

out = (
    df.merge(override, on=group, how='left')
    .assign(
        value=lambda x: x['value'].mask(
            x['BKT_order'].ge(
                x['BKT_order']
                .where(x['BKT'].eq(x['max_BKT']))
                .groupby([x[c] for c in group])
                .transform('first')
            ),
            x.reindex(cols, axis=1).to_numpy()[np.arange(len(x)), idx],
        )
    )
    .reindex(columns=df.columns)
)

Output:

  orig dest type class BKT  BKT_order  value  fc_Cap  sc_Cap
0  AMD  TRY   SA    fc  MA          1  12.04      20      50
1  AMD  TRY   SA    fc  TY          2  20.00      20      50
2  AMD  TRY   SA    fc  NY          3  20.00      20      50
3  AMD  TRY   SA    fc  MU          4  20.00      20      50
4  AMD  TRY   PE    fc  RE          1   9.70      20      50
5  AMD  TRY   PE    sc  EW          5   7.70      20      50
6  NCL  MNK   PE    sc  PO          2  50.00      20      50
7  NCL  MNK   PE    sc  TU          3  50.00      20      50
8  NCL  MNK   PE    sc  MA          1  16.70      20      50

Intermediates:

tmp = df.merge(override, on=group, how='left')
tmp['cap'] = tmp.reindex(cols, axis=1).to_numpy()[np.arange(len(tmp)), idx]
tmp['mask'] = tmp['BKT'].eq(tmp['max_BKT'])
tmp['masked_BKT'] = tmp['BKT_order'].where(tmp['mask'])
tmp['ref_BKT'] = tmp.groupby(group)['masked_BKT'].transform('first')
tmp['>= ref_BKT'] = tmp['BKT_order'].ge(tmp['ref_BKT'])

  orig dest type class BKT  BKT_order  value  fc_Cap  sc_Cap max_BKT  cap   mask  masked_BKT  ref_BKT  >= ref_BKT
0  AMD  TRY   SA    fc  MA          1  12.04      20      50      TY   20  False         NaN      2.0       False
1  AMD  TRY   SA    fc  TY          2  11.50      20      50      TY   20   True         2.0      2.0        True
2  AMD  TRY   SA    fc  NY          3  17.70      20      50      TY   20  False         NaN      2.0        True
3  AMD  TRY   SA    fc  MU          4   9.70      20      50      TY   20  False         NaN      2.0        True
4  AMD  TRY   PE    fc  RE          1   9.70      20      50     NaN   20  False         NaN      NaN       False
5  AMD  TRY   PE    sc  EW          5   7.70      20      50     NaN   50  False         NaN      NaN       False
6  NCL  MNK   PE    sc  PO          2   8.70      20      50      PO   50   True         2.0      2.0        True
7  NCL  MNK   PE    sc  TU          3  12.50      20      50      PO   50  False         NaN      2.0        True
8  NCL  MNK   PE    sc  MA          1  16.70      20      50      PO   50  False         NaN      2.0       False