i have data frame like this,
id | value |
---|---|
a | 2 |
a | 4 |
a | 3 |
a | 5 |
b | 1 |
b | 4 |
b | 3 |
c | 1 |
c | nan |
c | 5 |
the resulted data frame contain new column ['average'] and to get its values will be:
the resulted data frame must be :
id | value | average |
---|---|---|
a | 2 | 2 |
a | 4 | 2 |
a | 3 | 3 |
a | 5 | 3 |
b | 1 | 1 |
b | 4 | 1 |
b | 3 | 2.5 |
c | 1 | 1 |
c | nan | 1 |
c | 5 | 1 |
You can group the dataframe by id
, then calculate the expanding mean for value
column for each groups, then shift the expanding mean and get it back to the original dataframe, once you have it, you just need to ffill
on axis=1
on for the value
and average
columns to get the first value for the categories:
out = (df
.assign(average=df
.groupby(['id'])['value']
.transform(lambda x: x.expanding().mean().shift(1))
)
)
out[['value', 'average']] = out[['value', 'average']].ffill(axis=1)
OUTPUT:
id value average
0 a 2.0 2.0
1 a 4.0 2.0
2 a 3.0 3.0
3 a 5.0 3.0
4 b 1.0 1.0
5 b 4.0 1.0
6 b 3.0 2.5
7 c 1.0 1.0
8 c NaN 1.0
9 c 5.0 1.0