Search code examples
pythonpandasgroup-byquartile

Pandas : zscore among the groups


I am trying to find z score of values among the groups, so for example in the following data

df:

GROUP VALUE
 1     5
 2     2
 1     10
 2     20
 1     7

In Group 1 there are values 5, 10, 7. So now I am looking for their zscore in their group only

Sample Desired Output: 

GROUP VALUE Z_SCORE
 1     5     0.5
 2     2     0.01
 1     10    7
 2     20    8.3
 1     7     1.3

zscore above are not true calculated values, just a representation.

I am trying the following

def z_score(x):
   z = np.abs(stats.zscore(x))
   return z

df['Z_SCORE'] = df.groupby(['GROUP'])['Value'].apply(z_score)

but not able to do it successfully. How can I achieve this?


Solution

  • Use GroupBy.transform instead apply for correct convert numpy arrays to new Series per groups:

    from  scipy.stats import zscore
    
    def z_score(x):
       z = np.abs(zscore(x))
       return z
    
    df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].transform(z_score)
    
    print (df)
       GROUP  VALUE   Z_SCORE
    0      1      5  1.135550
    1      2      2  1.000000
    2      1     10  1.297771
    3      2     20  1.000000
    4      1      7  0.162221
    

    Solution with GroupBy.apply is possible, but is necessary change function for return Series with index by each group:

    def z_score(x):
       z = np.abs(zscore(x))
       return pd.Series(z, index=x.index)
    
    
    df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].apply(z_score)
    print (df)
       GROUP  VALUE   Z_SCORE
    0      1      5  1.135550
    1      2      2  1.000000
    2      1     10  1.297771
    3      2     20  1.000000
    4      1      7  0.162221