Search code examples
pythonpandasdataframecalculated-columnsmedian

Pandas groupby and correct with median in new column


My dataframe look like this

Plate Sample LogRatio
 P1     S1     0.42
 P1     S2     0.23 
 P2     S3     0.41 
 P3     S4     0.36 
 P3     S5     0.18

I have calculated the median of each plate (but it's probably not the best idea to start like this)

grouped = df.groupby("Plate")
medianesPlate = grouped["LogRatio"].median() 

And I want to add a column on my dataframe

CorrectedLogRatio = LogRatio-median(plate)

I suppose with :

df["CorrectedLogRatio"] = LogRatio-median(plate)

To have something like this :

Plate Sample LogRatio CorrectedLogRatio
 P1     S1     0.42    0.42-median(P1)   
 P1     S2     0.23    0.23-median(P1)
 P2     S3     0.41    0.41-median(P2)
 P3     S4     0.36    0.36-median(P3)
 P3     S5     0.18    0.18-median(P3)

But I don't know how to get the median from medianesPlates. I tried some apply and transform functions but it doesn't work. Thanks for any help


Solution

  • You can use transform:

    df['CorrectedLogRatio'] = df['LogRatio'] - df.groupby('Plate')['LogRatio'].transform('median')
    

    The resulting output:

      Plate Sample  LogRatio  CorrectedLogRatio
    0    P1     S1      0.42              0.095
    1    P1     S2      0.23             -0.095
    2    P2     S3      0.41              0.000
    3    P3     S4      0.36              0.090
    4    P3     S5      0.18             -0.090