Say I have a dataframe with column ['Subtype'] and ['Building Condition'],
Dataframe 1:
Subtype | Building Condition |
---|---|
A | Good |
B | Bad |
C | Bad |
I want to map dataframe 1 using another dataframe based on the values of those two columns
Dataframe 2:
Good | Bad | |
---|---|---|
A | Repair | Retrofit |
B | Retrofit | Reconstruct |
C | Reconstruct | Reconstruct |
Iterate over the first dataframe and use the pd.df.at function for each row and append them to an empty list.
#assignment of intervention based on subtype and building condition
intervention_list = []
for index, row in bldg_df.iterrows():
# print(bldg_df.at[index, 'Subtype'])
intervention = matrix_df.at[bldg_df['Subtype'][index],
bldg_df['Building Condition'][index]]
intervention_list.append(intervention)
bldg_df['Intervention'] = intervention_list
So the resulting dataframe would be
Subtype | Building Condition | Intervention |
---|---|---|
A | Good | Repair |
B | Bad | Reconstruct |
C | Bad | Reconstruct |
This worked but I'm just thinking if there's a faster and more efficient way of going about this. Maybe using the map or merge as_of function?
The documented approach in such case is to use indexing lookup on the underyling numpy array:
# factorize the requested name
idx, cols = pd.factorize(df1['Building Condition'])
# reorder and slice
df1['Intervention'] = (df2.reindex(index=df1['Subtype'], columns=cols)
.to_numpy()[np.arange(len(df1)), idx]
)
Output:
Subtype Building Condition Intervention
0 A Good Repair
1 B Bad Reconstruct
2 C Bad Reconstruct
This is also the most efficient approach: