I was able to find the proper formula for a Moving average here: Moving Average SO Question
The issue is it is using the 1 occurrence prior and the current rows input. I am trying to use the 2 prior occurrence to the row I am trying to predict.
import pandas as pd
import numpy as np
df = pd.DataFrame({'person':['john','mike','john','mike','john','mike'],
'pts':[10,9,2,2,5,5]})
df['avg'] = df.groupby('person')['pts'].transform(lambda x: x.rolling(2).mean())
From the output we see that Johns second entry is using his first and the current row to Avg. What I am looking for is John and Mikes last occurrences to be John: 6 and Mike: 5.5 using the prior two, not the previous one and the current rows input. I am using this for a prediction and would not know the current rows pts
because they haven't happend yet. New to Machine Learning and this was my first thought for a feature.
If want shift per groups add Series.shift
to lambda function:
df['avg'] = df.groupby('person')['pts'].transform(lambda x: x.rolling(2).mean().shift())
print (df)
person pts avg
0 john 10 NaN
1 mike 9 NaN
2 john 2 NaN
3 mike 2 NaN
4 john 5 6.0
5 mike 5 5.5