I have a pandas dataframe which looks something like this.
orig | dest | type | class | BKT | BKT_order | value | fc_Cap | sc_Cap
-----+-------+-------+-------+--------+-----------+---------+--------+---------
AMD | TRY | SA | fc | MA | 1 | 12.04 | 20 | 50
AMD | TRY | SA | fc | TY | 2 | 11.5 | 20 | 50
AMD | TRY | SA | fc | NY | 3 | 17.7 | 20 | 50
AMD | TRY | SA | fc | MU | 4 | 09.7 | 20 | 50
AMD | TRY | PE | fc | RE | 1 | 09.7 | 20 | 50
AMD | TRY | PE | sc | EW | 5 | 07.7 | 20 | 50
NCL | MNK | PE | sc | PO | 2 | 08.7 | 20 | 50
NCL | MNK | PE | sc | TU | 3 | 12.5 | 20 | 50
NCL | MNK | PE | sc | MA | 1 | 16.7 | 20 | 50
Also i have an override Dataframe which may look something like this:
orig | dest | type | max_BKT
-----+-------+-------+-----------
AMD | TRY | SA | TY
NCL | MNK | PE | PO
NCL | AGZ | PE | PO
what i want to do is modify the original dataframe such that after comparison of orig
dest
type
& BKT
( with max_BKT
) values, the value
column for any rows which have the BKT_order
higher than or equal to the max_BKT
in override DF is set to either fc_Cap
or sc_Cap
depending on the class
value.
For Example in above scenario,
Since the Override DF sets max_BKT
as TY
for AMD | TRY | SA
and the bucket order for TY
is 2
in original Df, i need to set the value
column equal to fc_Cap
or sc_Cap
depending on the value of class
for all rows where BKT_order
>= 2
So basically:
orig
dest
type
combinationBKT_order
of max_BKT
from the Original DFclass == fc
update value column with fc_Capclass == sc
update value column with sc_CapSo our original DF looks something like this:
orig | dest | type | class | BKT | BKT_order | value | fc_Cap | sc_Cap
-----+-------+-------+-------+--------+-----------+---------+--------+---------
AMD | TRY | SA | fc | MA | 1 | 12.04 | 20 | 50
AMD | TRY | SA | fc | TY | 2 | 20 | 20 | 50
AMD | TRY | SA | fc | NY | 3 | 20 | 20 | 50
AMD | TRY | SA | fc | MU | 4 | 20 | 20 | 50
AMD | TRY | PE | fc | RE | 1 | 09.7 | 20 | 50
AMD | TRY | PE | sc | EW | 5 | 07.7 | 20 | 50
NCL | MNK | PE | sc | PO | 2 | 50 | 20 | 50
NCL | MNK | PE | sc | TU | 3 | 50 | 20 | 50
NCL | MNK | PE | sc | MA | 1 | 16.7 | 20 | 50
I have tried an approach to iterate over the override df and try to handle 1 row at a time but, i get stuck when i need to do a reverse lookup to get the BKT_order
of the max_BKT
from original df.
Hope that makes sense... i am fairly new to pandas.
That's a fairly complex task. The individual steps are straightforward though.
You need:
merge
to match the max_BKTmask
+groupby.transform
to identify the rows to maskidx, cols = pd.factorize(df['class']+'_Cap')
group = ['orig', 'dest', 'type']
out = (
df.merge(override, on=group, how='left')
.assign(
value=lambda x: x['value'].mask(
x['BKT_order'].ge(
x['BKT_order']
.where(x['BKT'].eq(x['max_BKT']))
.groupby([x[c] for c in group])
.transform('first')
),
x.reindex(cols, axis=1).to_numpy()[np.arange(len(x)), idx],
)
)
.reindex(columns=df.columns)
)
Output:
orig dest type class BKT BKT_order value fc_Cap sc_Cap
0 AMD TRY SA fc MA 1 12.04 20 50
1 AMD TRY SA fc TY 2 20.00 20 50
2 AMD TRY SA fc NY 3 20.00 20 50
3 AMD TRY SA fc MU 4 20.00 20 50
4 AMD TRY PE fc RE 1 9.70 20 50
5 AMD TRY PE sc EW 5 7.70 20 50
6 NCL MNK PE sc PO 2 50.00 20 50
7 NCL MNK PE sc TU 3 50.00 20 50
8 NCL MNK PE sc MA 1 16.70 20 50
Intermediates:
tmp = df.merge(override, on=group, how='left')
tmp['cap'] = tmp.reindex(cols, axis=1).to_numpy()[np.arange(len(tmp)), idx]
tmp['mask'] = tmp['BKT'].eq(tmp['max_BKT'])
tmp['masked_BKT'] = tmp['BKT_order'].where(tmp['mask'])
tmp['ref_BKT'] = tmp.groupby(group)['masked_BKT'].transform('first')
tmp['>= ref_BKT'] = tmp['BKT_order'].ge(tmp['ref_BKT'])
orig dest type class BKT BKT_order value fc_Cap sc_Cap max_BKT cap mask masked_BKT ref_BKT >= ref_BKT
0 AMD TRY SA fc MA 1 12.04 20 50 TY 20 False NaN 2.0 False
1 AMD TRY SA fc TY 2 11.50 20 50 TY 20 True 2.0 2.0 True
2 AMD TRY SA fc NY 3 17.70 20 50 TY 20 False NaN 2.0 True
3 AMD TRY SA fc MU 4 9.70 20 50 TY 20 False NaN 2.0 True
4 AMD TRY PE fc RE 1 9.70 20 50 NaN 20 False NaN NaN False
5 AMD TRY PE sc EW 5 7.70 20 50 NaN 50 False NaN NaN False
6 NCL MNK PE sc PO 2 8.70 20 50 PO 50 True 2.0 2.0 True
7 NCL MNK PE sc TU 3 12.50 20 50 PO 50 False NaN 2.0 True
8 NCL MNK PE sc MA 1 16.70 20 50 PO 50 False NaN 2.0 False