I am trying to find z score of values among the groups, so for example in the following data
df:
GROUP VALUE
1 5
2 2
1 10
2 20
1 7
In Group 1 there are values 5, 10, 7. So now I am looking for their zscore in their group only
Sample Desired Output:
GROUP VALUE Z_SCORE
1 5 0.5
2 2 0.01
1 10 7
2 20 8.3
1 7 1.3
zscore above are not true calculated values, just a representation.
I am trying the following
def z_score(x):
z = np.abs(stats.zscore(x))
return z
df['Z_SCORE'] = df.groupby(['GROUP'])['Value'].apply(z_score)
but not able to do it successfully. How can I achieve this?
Use GroupBy.transform
instead apply
for correct convert numpy arrays to new Series
per groups:
from scipy.stats import zscore
def z_score(x):
z = np.abs(zscore(x))
return z
df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].transform(z_score)
print (df)
GROUP VALUE Z_SCORE
0 1 5 1.135550
1 2 2 1.000000
2 1 10 1.297771
3 2 20 1.000000
4 1 7 0.162221
Solution with GroupBy.apply
is possible, but is necessary change function for return Series
with index by each group:
def z_score(x):
z = np.abs(zscore(x))
return pd.Series(z, index=x.index)
df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].apply(z_score)
print (df)
GROUP VALUE Z_SCORE
0 1 5 1.135550
1 2 2 1.000000
2 1 10 1.297771
3 2 20 1.000000
4 1 7 0.162221