My organization uses special codes for various employee attributes. We are migrating to a new system and I have to map these codes to a new code based on certain logic.
Here is my mappings df Mappings
:
State Old_Mgmt New_Mgmt Old_ID New_ID New_Site
01 A001 A100 0000 0101 123
01 A002 A100 0000 0102
01 A003 A105 0000 0103 123
02 A001 A100 0000 0101
And here is EmployeeData
:
State Management ID Site
01 A001 0000 456
01 A002 0000 987
02 A002 0000 987
....
The logic for the mapping is to go through each row of EmployeeData
and if there is a match for State
, Management
, and ID
, then it will update to the corresponding New_
value. However for Site
, it will update the Site ID only if New_Site
is not blank/NaN. This mapping will modify the original dataframe.
Based on the above mapping the new EmployeeData
would be:
State Management ID Site
01 A100 0101 123 (modified this row)
01 A100 0102 987 (modified this row)
02 A002 0000 987
....
My initial thought process was to do something like this:
for i,r in EmployeeData.iterrows(): # For each employee row
# Create masks for the filters we are looking for
mask_state = Mappings['State'] == r['State']
mask_mgmt = Mappings['Old_Mgmt'] == r['Management']
mask_id = Mappings['Old_ID'] == r['ID']
# Filter mappings for the above 3 conditions
MATCH = Mappings[mask_state & mask_mgmt & mask_id]
if MATCH.empty: # No matches found
print("No matches found in mapping. No need to update. Skipping.")
continue
MATCH = MATCH.iloc[0] # If a match is found, it will correspond to only 1 row
EmployeeData.at[i, 'Management'] = MATCH['New_Mgmt']
EmployeeData.at[i, 'ID'] = MATCH['New_ID']
if pd.notna(MATCH['New_Site']):
EmployeeData.at[i, 'Site'] = MATCH['New_Site']
However this seems fairly inefficient because I have to filter mappings for every row. If only 1 column was being mapped, I would do something like:
# Make a dict mapping Old_Mgmt -> New_Mgmt
MGMT_MAPPING = pd.Series(Mappings['New_Mgmt'].values,index=Mappings['Old_Mgmt']).to_dict()
mask_state = Mappings['State'] = r['State']
EmployeeData.loc[mask_state, 'Management'] = EmployeeData.loc[mask_state, 'Management'].replace(MGMT_MAPPING)
But that would not work for my situation since I need to map multiple values
EmployeeData.update(EmployeeData
.rename(columns={'Management': 'Old_Mgmt', 'ID': 'Old_ID'})
.merge(Mappings.rename(columns={'New_Mgmt': 'Management', 'New_ID': 'ID', 'New_Site': 'Site'}),
how='left', on=['State', 'Old_Mgmt', 'Old_ID'], suffixes=('_old', None))
.replace('', None)[EmployeeData.columns]
)
Updated EmployeeData
:
State Management ID Site
0 01 A100 0101 123
1 01 A100 0102 987
2 02 A002 0000 987