Search code examples
pandasdataframeapplyaverage

I want to average only the last 5 rows of different groups contained in one column in a dataframe and make a new dataframe containing two columns


I have a dataframe with four columns. In the column 'Intensity' there are 3 groups (0, 50, 100). I would like to average only the last 2 values of column Value over the 3 groups of column 'Intensity'. Then I would like to make a new dataframe with the columns 'Replication', 'Regime', 'Intensity', 'Value_mean' and 'Value_sd', the last two being the calculate average and the standard deviation.

Replication   Regime   Intensity   Value
 1          Ctrl       0          2
 1          Ctrl       0          3
 1          Ctrl       0          4
 1          Ctrl       0          5
 1          Ctrl       0          6
 1          Ctrl       0          7
 1          Ctrl       50          1
 1          Ctrl       50          2
 1          Ctrl       50          2
 1          Ctrl       50          4
 1          Ctrl       50          6
 1          Ctrl       50          6
 1          Ctrl       100         2
 1          Ctrl       100         1
 1          Ctrl       100         0
 2          Ctrl       100         3
 2          Ctrl       0          7
 2          Ctrl       0          3
 2          Ctrl       0          6
 2          Ctrl       0          2
 2          Ctrl       0          1
 2          Ctrl       0          5
 2          Ctrl       50         12
 2          Ctrl       50         22
 2          Ctrl       50          52
 2          Ctrl       50          22
 2          Ctrl       50          2
 2          Ctrl       50          2
 2          Ctrl       100         22
 2          Ctrl       100         2
 2          Ctrl       100         25

So far I used the the function apply, but I don't get a dataframe but a series

 df2 = df1.groupby(['Regime','Intensity']).apply(lambda x: 
       x.tail(3).mean(axis=0,level=0))
 

and I get

                                 Intensity      A
               Regime Intensity
               Ctrl      0          0               -0.87
                        50         50               2.08
                       100        100               4.84
  

Solution

  • Use DataFrame.tail in first step and then create new columns by GroupBy.transform:

    df2 = df1.groupby(['Regime','Intensity']).tail(3).copy()
     
    df2['mean_val'] = df2.groupby('Regime')['Value'].transform('mean')
    df2['std_val'] = df2.groupby('Regime')['Value'].transform('std') 
    print (df2)
        Replication Regime  Intensity  Value  mean_val    std_val
    19            2   Ctrl          0      2  9.222222  10.425663
    20            2   Ctrl          0      1  9.222222  10.425663
    21            2   Ctrl          0      5  9.222222  10.425663
    25            2   Ctrl         50     22  9.222222  10.425663
    26            2   Ctrl         50      2  9.222222  10.425663
    27            2   Ctrl         50      2  9.222222  10.425663
    28            2   Ctrl        100     22  9.222222  10.425663
    29            2   Ctrl        100      2  9.222222  10.425663
    30            2   Ctrl        100     25  9.222222  10.425663