I'm trying to normalize continuous variables first across the entire DF, then normalize again within group.
Here is a sample DF
ave_win_last5 id_race
0 6.00 6734
1 3.25 6734
2 6.75 6734
3 5.50 6734
4 5.50 6734
I'm able to normalize within the df by using
x_var['ave_win_last5'] = (x_var['ave_win_last5']-x_var['ave_win_last5'].mean())/x_var['ave_win_last5'].std()
However, when I attempt to then normalize within the group, the output is all NAN
x_var['ave_win_last5'] = (x_var['ave_win_last5'] -x_var.groupby('id_race')['ave_win_last5'].mean())/x_var.groupby('id_race')['ave_win_last5'].std()
ave_win_last5 id_race
0 NaN 6734
1 NaN 6734
2 NaN 6734
3 NaN 6734
4 NaN 6734
I'm not sure why this is returning NaN.
One option is to use groupby.transform
, and move the normalization logic into transform
:
df['ave_win_last5'] = df.groupby('id_race').ave_win_last5.transform(lambda s: (s - s.mean()) / s.std())
df
# ave_win_last5 id_race
#0 0.459335 6734
#1 -1.645952 6734
#2 1.033505 6734
#3 0.076556 6734
#4 0.076556 6734