Search code examples
pythonpandaspandas-groupby

How to make a new pandas column that's the average of the last 3 values?


Let's say I have a dataframe with 3 columns, dt, unit, sold. What I would like to know how to do is how to create a new column called say, prior_3_avg, that is as the name suggests, an average of sold by unit for the past three same-day-of-week as dt. E.g., for unit "1" on May 5th 2020, what's the average it sold on April 28th, 21st, and 14th, which are the last three thursdays?

Toy sample data:

df = pd.DataFrame({'dt':['2020-5-1','2020-5-2','2020-5-3','2020-5-4','2020-5-5','2020-5-6','2020-5-7','2020-5-8','2020-5-9','2020-5-10','2020-5-11','2020-5-12','2020-5-13','2020-5-14','2020-5-15','2020-5-16','2020-5-17','2020-5-18','2020-5-19','2020-5-20','2020-5-21','2020-5-22','2020-5-23','2020-5-24','2020-5-25','2020-5-26','2020-5-27','2020-5-28','2020-5-1','2020-5-2','2020-5-3','2020-5-4','2020-5-5','2020-5-6','2020-5-7','2020-5-8','2020-5-9','2020-5-10','2020-5-11','2020-5-12','2020-5-13','2020-5-14','2020-5-15','2020-5-16','2020-5-17','2020-5-18','2020-5-19','2020-5-20','2020-5-21','2020-5-22','2020-5-23','2020-5-24','2020-5-25','2020-5-26','2020-5-27','2020-5-28',],'unit':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'sold':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]})

df['dt'] = pd.to_datetime(df['dt'])

           dt  unit  sold 
0  2020-05-01     1     1
1  2020-05-02     1     2
2  2020-05-03     1     3
3  2020-05-04     1     4
4  2020-05-05     1     5
5  2020-05-06     1     6
...

How would I go about this? I've seen: Pandas new column from groupby averages

That explains how to just do a group by on the columns. I figure I could do a "day of week" column, but then I still have the same problem of wanting to limit to the past 3 matching day of week values instead of just all of the results.

It could possibly have something to do with this, but this looks more like it's useful for one-off analysis than making a new column: limit amount of rows as result of groupby Pandas


Solution

  • This should work:

    df['dayofweek'] = df['dt'].dt.dayofweek
    df['output'] = df.apply(lambda x: df['sold'][(df.index < x.name) & (df.dayofweek == x.dayofweek)].tail(3).sum(), axis = 1)