My dataframe priceDF begins like this:
Date symbol Close
0 2000-01-03 HELN.SW 28.28
1 2000-01-04 HELN.SW 27.50
2 2000-01-05 HELN.SW 26.71
3 2000-01-06 HELN.SW 27.16
4 2000-01-07 HELN.SW 27.50
To calculate the moving average of the Closes, grouped by symbol I did:
priceDF['avg'] = priceDF.groupby('symbol')'Close'].rolling(3).mean().reset_index(drop=True)
and I get:
Date symbol Close avg
0 2000-01-03 HELN.SW 28.28 NaN
1 2000-01-04 HELN.SW 27.50 NaN
2 2000-01-05 HELN.SW 26.71 12.537398
3 2000-01-06 HELN.SW 27.16 12.022164
4 2000-01-07 HELN.SW 27.50 11.922733
In row 2, I want avg = 27.50, the average of the Closes in rows 0 to 2. What do I misunderstand?
I am pretty sure this is an issue with how reset_index
is used here,in conjuction with having multiple groups. Let us consider a slightly extended example:
from io import StringIO
data = StringIO(
'''
Date symbol Close
0 2000-01-03 HELN.SW 28.28
1 2000-01-04 HELN.SW 27.50
2 2000-01-05 HELN.SW 26.71
3 2000-01-06 HELN.SW 27.16
4 2000-01-07 HELN.SW 27.50
5 2000-01-07 AAAA.SW 30.00
''')
df = pd.read_csv(data, sep = '\s+', index_col=0)
(we have added AAAA.SW to the list)
Now this:
df.groupby('symbol')['Close'].rolling(3).mean()
produces sensible numbers (as there is only one date for AAAA we expect NaN:
symbol
AAAA.SW 5 NaN
HELN.SW 0 NaN
1 NaN
2 27.496667
3 27.123333
4 27.123333
Name: Close, dtype: float64
but this:
df.groupby('symbol')['Close'].rolling(3).mean().reset_index(drop=True)
produces wrong indices
0 NaN
1 NaN
2 NaN
3 27.496667
4 27.123333
5 27.123333
Name: Close, dtype: float64
and when put into the original df end up in the wrong rows:
df['avg'] = df.groupby('symbol')['Close'].rolling(3).mean().reset_index(drop=True)
df
produces
Date symbol Close avg
0 2000-01-03 HELN.SW 28.28 NaN
1 2000-01-04 HELN.SW 27.50 NaN
2 2000-01-05 HELN.SW 26.71 NaN
3 2000-01-06 HELN.SW 27.16 27.496667
4 2000-01-07 HELN.SW 27.50 27.123333
5 2000-01-07 AAAA.SW 30.00 27.123333
a solution is to do the aqssignments within each group, like this:
df.groupby('symbol').apply(lambda g: g.assign(avg = g['Close'].rolling(3).mean())).reset_index(drop=True)
so we get
Date symbol Close avg
0 2000-01-07 AAAA.SW 30.00 NaN
1 2000-01-03 HELN.SW 28.28 NaN
2 2000-01-04 HELN.SW 27.50 NaN
3 2000-01-05 HELN.SW 26.71 27.496667
4 2000-01-06 HELN.SW 27.16 27.123333
5 2000-01-07 HELN.SW 27.50 27.123333