Search code examples
pythonpandasmean

How to get average between first row and current row per each group in data frame?


i have data frame like this,

id value
a 2
a 4
a 3
a 5
b 1
b 4
b 3
c 1
c nan
c 5

the resulted data frame contain new column ['average'] and to get its values will be:

  • make group-by(id)
  • first row in 'average' column per each group is equal to its corresponding value in 'value'
  • other rows in ' average' in group is equal to mean for all previous rows in 'value'(except current value)

the resulted data frame must be :

id value average
a 2 2
a 4 2
a 3 3
a 5 3
b 1 1
b 4 1
b 3 2.5
c 1 1
c nan 1
c 5 1

Solution

  • You can group the dataframe by id, then calculate the expanding mean for value column for each groups, then shift the expanding mean and get it back to the original dataframe, once you have it, you just need to ffill on axis=1 on for the value and average columns to get the first value for the categories:

    out = (df
           .assign(average=df
                   .groupby(['id'])['value']
                   .transform(lambda x: x.expanding().mean().shift(1))
                   )
           )
    out[['value', 'average']] = out[['value', 'average']].ffill(axis=1)
    

    OUTPUT:

      id  value  average
    0  a    2.0      2.0
    1  a    4.0      2.0
    2  a    3.0      3.0
    3  a    5.0      3.0
    4  b    1.0      1.0
    5  b    4.0      1.0
    6  b    3.0      2.5
    7  c    1.0      1.0
    8  c    NaN      1.0
    9  c    5.0      1.0