I have a dataframe that looks like this :
A B C
34 x a
3 y b
23 y a
40 x b
Essentially, cols B and C need to become dummy variables, with headers B_x, B_y, C_a, C_b. The function is almost exactly how get_dummies() works in pandas, with one major difference: I need the value to be the value in column A for all dummy variables created where the value would be 1. Something like
A B_x B_y C_a C_b
34 34 0 34 0
3 0 3 0 3
23 0 23 23 0
40 40 0 0 40
I'm working with fairly large data with a high number of categories.
I've tried using get_dummies() on the dataset and then df.mask to change all 1's to df.A, however this is atrociously slow (about 10min).
Use pd.get_dummies
and broadcast column A
df2 = pd.get_dummies(df[['B', 'C']]) * df.A.values.reshape([-1,1])
B_x B_y C_a C_b
0 34 0 34 0
1 0 3 0 3
2 0 23 23 0
3 40 0 0 40
To assign back A
, there are Many alternatives. Can do df2['A'] = df['A']
or use pd.concat
pd.concat([df.A, df2], axis=1)