I've looked through the forums and can't seem to figure this out. I have the following data. I assume the answer lies in the "groupby" function but I can't seem to work it out.
Date Hour Value 3DAverage
1/1 1 57 53.33
1/1 2 43 42.33
1/1 3 44 45.33
1/2 1 51 ...
1/2 2 40 ...
1/2 3 42 ...
1/3 1 56 ...
1/3 2 42
1/3 3 48
1/4 1 53
1/4 2 45
1/4 3 46
1/5 1 56
1/5 2 46
1/5 3 48
1/5 4 64 *
1/6 1 50
1/6 2 41
1/6 3 42
1/7 1 57
1/7 2 43
1/7 3 45
1/8 1 58
1/8 2 49
1/8 3 41
1/9 1 53
1/9 2 46
1/9 3 47
1/10 1 58
1/10 2 49
1/10 3 40
What I am trying to do is add the "3DAverage" column. I would like this column to produce an average of the "Value" column for the PRIOR 3 corresponding hour values. I want to fill this column down for the entire series. For example, the value 53.33 is an average of the value for hour 1 on 1/2, 1/3, and 1/4. I would like this to continue down the column using only the prior 3 values for each "HourValue".
Also, please note that there are instances such as 1/5 hour 4. Not all dates have the same number of hours, so I am looking for the last 3 hour values for dates in which those hours exist.
I hope that makes sense. Thanks so much in advance for your help !
You can groupby
on Date column and do the following:
df['3DAverage'] = df['Hour'].map(df.groupby('Hour').apply(lambda x: x.loc[x['Date'].isin(['1/2','1/3','1/4']),'Value'].mean()))
df.head(6)
Date Hour Value 3DAverage
0 1/1 1 57 53.333333
1 1/1 2 43 42.333333
2 1/1 3 44 45.333333
3 1/2 1 51 53.333333
4 1/2 2 40 42.333333
5 1/2 3 42 45.333333